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Abstract.  Measures  of  residual  risk  are  developed  as  extension  of  measures  of  risk.  They  view  a 
random  variable  of  interest  in  concert  with  an  auxiliary  random  vector  that  helps  to  manage,  predict, 
and  mitigate  the  risk  in  the  original  variable.  Residual  risk  can  be  exemplified  as  a  quantification  of 
the  improved  situation  faced  by  a  hedging  investor  compared  to  that  of  a  single-asset  investor,  but  the 
notion  reaches  further  with  deep  connections  emerging  with  forecasting  and  generalized  regression.  We 
establish  the  fundamental  properties  in  this  framework  and  show  that  measures  of  residual  risk  along 
with  generalized  regression  can  play  central  roles  in  the  development  of  risk-tuned  approximations  of 
random  variables,  in  tracking  of  statistics,  and  in  estimation  of  the  risk  of  conditional  random  variables. 
The  paper  ends  with  dual  expressions  for  measures  of  residual  risk,  which  lead  to  further  insights  and 
a  new  class  of  distributionally  robust  optimization  models. 
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1  Introduction 

Quantification  of  the  “risk”  associated  with  possible  outcomes  of  a  stochastic  phenomenon,  as  described 
by  a  random  variable,  is  central  to  much  of  operations  research,  economics,  reliability  engineering,  and 
related  areas.  Measures  of  risk  are  important  tools  in  this  process  that  not  only  quantify  risk,  but  also 
facilitate  subsequent  optimization  of  the  parameters  on  which  risk  might  depend;  see  for  example  the 
recent  reviews  [13,  26,  25].  In  this  paper,  we  extend  the  concept  of  risk  measures  to  situations  where 
the  random  variable  of  interest  is  viewed  in  concert  with  a  related  random  vector  that  helps  to  manage, 
predict,  and  mitigate  the  risk  in  the  original  variable.  A  strategy  of  hedging  in  financial  engineering, 
where  the  effect  of  potential  losses  from  an  investment  is  reduced  by  taking  positions  in  correlated 

1This  material  is  based  upon  work  supported  in  part  by  the  U.  S.  Air  Force  Office  of  Scientific  Research  under  FA9550- 
11-1-0206,  F1ATA01194G001,  and  F4FGA04094G003  as  well  as  DARPA  under  HR0011412251. 
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instruments,  is  a  basic  example  that  motivates  our  definition  of  measures  of  residual  risk.  However, 
measures  of  residual  risk  extend  much  beyond  hedging  and,  in  fact,  lead  to  new  measures  of  risk  as  well 
as  deep-rooted  connections  with  regression,  risk-averse  forecasting,  and  a  multitude  of  applications. 

For  a  random  variable  Y  of  primary  interest  and  a  related  random  vector  X  =  (X\ ,  X2, ... ,Xn ),  we 
examine  the  situation  where  the  goal  is  to  find  a  regression  function  f  such  that  Y  is  well  approximated 
by  f(X).  Presumably  X  is  somehow  more  accessible  than  Y ,  making  f{X)  an  attractive  surrogate  for  Y . 
An  example  of  such  surrogate  estimation  arises  in  “factor  models”  in  financial  investment  applications 
(see  for  example  [6,  12]),  where  Y  is  the  loss  associated  with  a  particular  position  and  X  a  vector 
describing  a  small  number  of  macroeconomic  “factors”  such  as  interest  rates,  inflation  level,  and  GDP 
growth.  In  forecasting,  f(X)  might  be  the  (random)  forecast  of  the  phenomenon  described  by  Y,  with 
its  expectation  E[f(X)\  being  an  associated  point  prediction.  In  “uncertainty  quantification”  (see  for 
example  [14,  7]),  one  considers  the  output,  described  by  a  random  variable  Y,  of  a  system  subject  to 
random  input  X  whose  distribution  might  be  assumed  known.  Then,  a  regression  function  /  leads  to 
an  accessible  surrogate  estimate  f{X)  of  the  unknown  system  output  Y . 

In  surrogate  estimation,  traditionally,  the  focus  has  been  on  least-squares  regression  and  its  quan¬ 
tification  of  the  difference  between  Y  and  f{X)  in  terms  of  mean  squared  error  (MSE).  In  a  risk-averse 
context  where  high  realizations  of  Y  are  undesirable  beyond  any  compensation  by  occasional  low  real¬ 
izations,  the  symmetric  view  of  errors  inherent  in  MSE  might  be  inappropriate  and  the  consideration  of 
generalized,  risk-averse  regression  becomes  paramount.  A  fundamental  goal  would  then  be,  for  a  given 
measure  of  risk  1Z,  to  construct  a  regression  function  /  such  that 

7 Z{Y)  <  7 Z(f(X))  +  possibly  an  error  term. 

Initial  work  in  this  direction  includes  [22],  which  establishes  such  conservative  surrogate  estimates 
through  generalized  regression.  We  obtain  the  same  result  under  weaker  assumptions,  develop  means 
to  assess  the  goodness-of-fit  in  generalized  regression,  examine  the  stability  of  regression  functions,  and 
make  fundamental  connections  between  such  regression,  surrogate  estimation,  and  measures  of  residual 
risk. 

Generalized  regression  also  plays  a  central  role  in  situations  where  the  random  vector  X,  at  least 
eventually,  comes  under  the  control  of  a  decision  maker  and  the  primary  interest  is  then  in  the  condi¬ 
tional  random  variable  Y  given  X  =  x,  which  we  denote  by  Yx.  For  example,  the  goal  might  be  to  track 
a  given  statistic  of  Yx,  as  it  varies  with  x,  or  to  minimize  7Z(YX)  by  choice  of  x,  under  a  given  measure 
of  risk  7Z.  The  former  situation  is  a  theme  of  regression  analysis,  but  we  here  go  beyond  expectations 
and  quantiles,  a  traditional  focus,  and  consider  general  classes  of  statistics.  The  latter  situation  is  the 
standard  setting  of  risk-averse  stochastic  programming;  see  for  example  [13,  26].  Due  to  incomplete 
distributional  information  about  Yx  for  every  x  as  well  as  the  computational  cost  of  evaluating  1Z(YX) 
for  numerous  x,  for  example  within  an  optimization  algorithm,  it  might  be  beneficial  in  this  situation 
to  develop  a  regression  function  /  such  that 

for  x  in  a  subset  of  interest,  TZ(YX)  ~  f(x). 

Such  a  regression  function  provides  an  inexpensive  substitute  for  7 Z(Yx),x  €  Mn ,  within  optimization 
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models.  We  refer  to  this  situation  as  risk  tracking,  which  in  general  cannot  be  carried  out  with  precision; 
see  [21]  for  a  discussion  in  the  context  of  superquantile/CVaR  risk  measures.  Therefore,  we  look  at 
conservative  risk  tracking,  where  /  provides  an  (approximate)  upper  bound  on  lZ(Yx),x  €  Mn . 

In  the  particular  case  of  superquantile/CVaR  risk  measures,  kernel-based  estimators  for  the  con¬ 
ditional  probability  density  functions,  integration,  and  inversion  lead  to  estimates  of  conditional  su¬ 
perquantiles  [29,  4,  11],  Likewise,  weighted-sums-of-conditional  quantiles  also  give  estimators  of  con¬ 
ditional  superquantiles  [20,  5,  15].  More  generally,  there  is  an  extensive  literature  on  estimating  con¬ 
ditional  distribution  functions  using  nonparametric  kernel  estimators  (see  for  example  [9])  and  trans¬ 
formation  models  (see  for  example  [10]).  Of  course,  with  an  estimate  of  a  conditional  distribution 
function,  it  is  typically  straightforward  to  estimate  a  statistic  of  Yx  and/or  1Z{YX)  as  parameterized  by 
x  for  any  law-invariant  risk  measure.  However,  it  is  generally  difficult  to  obtain  quality  estimates  of 
such  conditional  distribution  functions  and  so  here  we  focus  on  obtaining  (conservative)  estimates  of 
statistics  and  risk  directly. 

It  is  well  known  through  convex  duality  that  many  measures  of  risk  quantify  the  risk  in  a  random 
variable  Y  to  be  the  worst-case  expected  value  of  Y  over  a  risk  envelope,  often  representing  a  set  of 
alternative  probability  distributions;  see  for  example  [26]  for  a  summary  of  results.  We  develop  parallel, 
dual  expressions  for  measures  of  residual  risk  and  show  that  knowledge  about  a  related  random  vector  X 
leads  to  a  residual  risk  envelope  that  is  typically  smaller  than  the  original  risk  envelope.  In  fact,  X  gives 
rise  to  a  new  class  of  distributionally  robust  and  computationally  tractable  optimization  models  that  is 
placed  between  an  expectation-minimization  model  and  a  distributionally  robust  model  generated  by 
a  risk  measure.  The  new  models  are  closely  allied  with  moment-matching  of  the  related  random  vector 
X.  Dual  expressions  of  measures  of  residual  risk  through  residual  risk  envelopes  provide  the  key  tool 
in  this  construction. 

The  contributions  of  the  paper  therefore  lie  in  the  introduction  of  measures  of  residual  risk,  the 
analysis  of  generalized  regression,  the  discovery  of  the  connections  between  residual  risk  and  regression, 
and  the  application  of  these  concepts  in  risk-tuned  surrogate  models,  statistic  and  risk  tracking,  and 
distributionally  robust  optimization.  In  the  process,  we  also  improve  and  simplify  prior  results  on  the 
connections  between  risk  measures  and  other  quantifiers. 

The  paper  continues  in  Section  2  with  a  review  of  basic  concepts,  definitions  of  measures  of  risk  and 
related  quantifiers,  and  a  theorem  about  connections  among  such  quantifiers  under  relaxed  assumptions. 
Section  3  defines  measures  of  residual  risk,  analyzes  their  properties,  and  makes  connections  with  gen¬ 
eralized  regression.  Sections  4  and  5  examine  surrogate  estimation  and  tracking,  respectively.  Section 
6  discusses  duality  and  distributionally  robust  formulations  of  optimization  problems.  An  appendix 
supplements  the  paper  with  examples  of  risk  measures  and  other  quantifiers. 

2  Preliminaries  and  Risk  Quadrangle  Connections 

This  section  establishes  terminology  and  provides  connections  among  measures  of  risk  and  related  quan¬ 
tities.  We  follow  the  risk  quadrangle  framework  described  in  [26],  but  relax  requirements  in  definitions 
and  thereby  extend  the  reach  of  that  framework.  We  consider  random  variables  defined  on  a  probability 
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space  (fi,F,P)  and  restrict  the  attention  to  the  subset  L 2  :=  {Y  :  tt M  \  Y  measurable,  E[Y2]  <  00} 
of  random  variables  with  finite  second  moments.  Although  much  of  the  discussion  holds  under  weaker 
assumptions,  among  other  issues  we  avoid  technical  complications  related  to  paired  topological  spaces  in 
duality  statements  under  this  restriction;  see  [28]  for  treatment  of  risk  measures  on  more  general  spaces. 
We  equip  C2  with  the  standard  norm  ||  •  H2  and  convergence  of  random  variables  in  C2  will  be  in  terms 
of  the  corresponding  (strong)  topology,  if  not  specified  otherwise.  We  adopt  a  perspective  concerned 
about  high  values  of  random  variables,  which  is  natural  in  the  case  of  “losses”  and  “costs.”  A  trivial 
sign  change  adjusts  the  framework  to  cases  where  low  values,  instead  of  high  values,  are  undesirable. 

We  examine  functionals  F  :  C2  — >•  (—00,00],  with  measures  of  risk  being  specific  instances.  As  we 
see  below,  several  other  functionals  also  play  key  roles.  The  following  properties  of  such  functionals 
arise  in  various  combinations2: 


Constancy  equivalence: 
Convexity: 

Closedness: 

Averseness: 

Positive  homogeniety: 
Monotonicity: 
Subadditivity: 
Finiteness: 


F(Y)  =  co  for  constant  random  variables  Y  =  co  €  M. 

F{(  1  -  t)Y  +  tY')  <  (1  -  t)F{Y)  +  tF{Y')  for  all  Y,  Y'  and  r  G  (0, 1). 
{Y  G  C2  |  F{Y)  <  co}  is  closed  for  all  co  G  M. 

F(Y)  >  E[Y]  for  nonconstant  Y. 

F(XY)  =  A F(Y)  and  for  every  A  >  0  and  Y. 

F{Y)  <  F(Y')  when  Y  <Y' . 

F(Y  +  Y')  <  F(Y)  +  Fiy')  for  all  Y,  Y' . 

F{Y)  <  00  for  all  Y. 


We  note  that  convexity  along  with  positive  homogeneity  is  equivalent  to  subadditivity  along  with 
positive  homogeneity.  Closedness  is  also  called  lower  semicontinuity. 

Through  conjugate  duality  (see  [23]  for  a  more  general  treatment),  every  closed  convex  functional 
F  :  C2  — >  (—00,00],  F  ^  00,  is  expressed  by 

F(Y)  =  sup  \e[QY]  -  -F*(Q)j  for  Y  G  £2,  (1) 

QGdomJ7*  ^  ' 

where  F*  :  C2  — >•  (  —00,00]  is  the  conjugate  to  F,  also  a  closed  convex  functional  not  identical  to  00, 
given  by 

F*(Q)  =  sup  \e[QY]  -  J-(Y))  for  Q  G  C2,  (2) 

and  domJ7  is  the  effective  domain  of  J7,  i.e.,  domJ7  :=  {Y  G  C2  \  F(Y)  <  00},  and  likewise  for 
dornJ7*.  Both  domJ7  and  domJ7*  are  necessarily  nonempty  and  convex.  The  following  facts  about 
such  functionals  are  used  in  the  paper.  F  is  positively  homogenous  if  and  only  if  F*(Q)  =  0  for 
Q  G  dom  F*.  F  is  monotonic  if  and  only  if  Q  >  0  for  Q  G  dom  F* .  The  elements  of  the  subdifferential 
dF(Y)  C  jC2  for  Y  G  C2  are  those  Q  satisfying  the  subgradient  inequality 

FiX')  >  HY)  +  E[Q(X'  ~  Y)}  for  all  Y'  G  C2 . 

2Extended  real-valued  calculus  is  handled  in  the  usually  manner:  0  •  00  =  0  and  0  •  (—00)  =  0;  a  ■  00  =  00  and 
a  ■  (—00)  =  —00  for  a  >  0;  00  +  00  =  00  +  (—00)  =  (—00)  +  00  =  00,  and  —00  +  (—00)  =  —00. 
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Moreover,  dJ-(Y )  =  argmaxglE'fQY]  —  E*(Q )}  and  this  set  is  nonempty  and  weakly  compact  for  all3 
Y  £  int(dom  J-). 

We  next  turn  the  attention  to  specific  functionals,  referred  to  as  measures  of  risk,  regret,  error, 
and  deviation,  that  are  tied  together  in  quadrangles  of  risk  with  connections  to  risk  optimization  and 
statistical  estimation;  see  Diagram  1  and  the  subsequent  development. 


risk  TZ  i — >  T>  deviation 

optimization  ||  <5  It  estimation 

regret  V  < — >  £  error 


Diagram  1:  The  Fundamental  Risk  Quadrangle 


A  measure  of  risk  is  a  functional  TZ  that  assigns  to  a  random  variable  Y  £  C2  a  value  TZ{Y ) 
in  (—00,00]  as  a  quantification  of  its  risk.  We  give  examples  of  measures  of  risk  as  well  as  other 
“measures”  throughout  the  article  and  in  the  Appendix. 

TZ  is  regular  if  it  satisfies  constancy  equivalence,  convexity,  closedness,  and  averseness. 

We  observe  that  for  a  regular  risk  measure,  7 Z(Y  +  co)  =  7 Z(Y)  +  cq  for  any  Y  £  C?  and  cq  £  ZR]  see  for 
example  [26].  Regular  measures  of  risk  are  related  to,  but  distinct  from  coherent  measures  of  risk  [1] 
and  convex  risk  functions  [28];  see  [26]  for  a  discussion. 

The  effective  domain  Q  :=  {Q  £  L '?  \  7 Z*(Q)  <  00}  of  the  conjugate  7 Z*  to  a  regular  measure 
of  risk  TZ  is  called  a  risk  envelope. 

Consequently,  maximization  in  (1)  takes  place  over  the  risk  envelope  when  T  is  a  regular  measure  of 
risk  TZ.  Moreover, 

a  Q  £  Q  that  attains  the  supremum  for  Y  £  C2,  i.e.,  TZiY)  =  E[QY]  —  7 Z*(Q),  is  called  a 
risk  identifier  at  Y  for  TZ.  with  all  such  Q  forming  the  set  <97 Z(Y). 

The  nonemptyness  of  such  subdifferentials  ensures  that  there  exists  a  risk  identifier  for  all  Y  £ 
iiitfdomT7). 

Closely  connected  to  risk  is  the  notion  of  regret,  which  in  many  ways  is  more  fundamental.  A 
measure  of  regret  is  a  functional  V  that  assigns  to  a  random  variable  Y  £  C2  a  value  V(Y)  in  (—00,  00] 
that  quantifies  the  current  displeasure  with  the  mix  of  possible  (future)  outcomes  for  Y . 

V  is  regidar  if  it  satisfies  convexity  and  closedness  as  well  as  the  property: 

V(0)  =  0,  but  V(Y)  >  E[Y }  when  Y  ^  0. 

3We  denote  the  (strong)  topological  interior  of  U  C  C?  by  int  U. 
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Regularity  is  here  defined  more  broadly  than  in  [26],  where  an  additional  condition  is  required.  If  Y  is 
a  financial  loss,  then  V(Y)  can  be  interpreted  as  the  monetary  compensation  demanded  for  assuming 
responsibility  for  covering  the  loss  Y .  We  note  that  V(Y)  can  be  viewed  simply  as  a  reorientation  of 
classical  “utility”  towards  losses.  Moreover,  one  can  construct  a  regular  measure  of  regret  V  from  a 
normalized  concave  utility  function  u  :  1R  — >  1R,  with  it(0)  =  0  and  u(y)  <  y  when  y  /  0,  by  setting 
V(Y)  =  —E[u(—Y)\. 

In  regression,  “error”  plays  the  central  role.  A  measure  of  error  is  a  functional  £  that  assigns  to  a 
random  variable  E  E  £2  a  value  £(Y )  in  [0,oo]  that  quantifies  its  nonzeroness. 

£  is  regular  if  it  satisfies  convexity  and  closedness  as  well  as  the  property: 

£(0)  =  0,  but  £{Y)  >  0  when  Y  ^  0. 

Again,  we  define  regularity  more  broadly  than  in  [26] 4 . 

An  extension  of  the  notion  of  standard  deviation  also  emerges.  A  measure  of  deviation  is  a  functional 
T>  that  assigns  to  a  random  variable  Y  E  C?  a  value  D(E)  in  [0,  oo]  that  quantifies  its  nonconstancy. 

V  is  regular  if  it  satisfies  convexity  and  closedness  as  well  as  the  property: 

T>{Y)  =  0  for  constant  random  variables  Y  =  Co  E  M,  but  V(Y )  >  0  for  nonconstant  Y  E  C2 . 

Error  minimization  is  the  focus  of  regression.  In  the  case  of  an  error  measure  £,  the  statistic 

5(E)  :=  argmin£(E  —  co)  (3) 

c0ei? 

is  the  quantity  obtained  through  such  minimization.  It  is  the  set  of  scalars,  in  many  cases  a  singleton, 
that  best  approximate  Y  in  the  sense  of  error  measure  £.  We  refer  to  the  Appendix  for  examples  of 
measures  of  risk,  regret,  error,  and  deviation,  and  corresponding  statistics. 

Before  giving  connections  among  the  various  measures  and  statistics,  we  establish  the  following 
technical  result.  The  proof  is  a  specialization  of  the  argument  in  the  proof  of  Lemma  3.3  provided 
below  and  is  therefore  omitted. 

2.1  Lemma  For  a  regular  measure  of  error  £  and  sequence  {cq}[JL1  of  scalars,  the  following  holds:  If 
Y1'  E  C2  and  bu  €  M  converge  to  Y  G  C2  and  b  G  M,  respectively,  and  £(YU  —  cff)  <  bu  for  all  u,  then 
{cq}^=1  is  hounded  and  any  accumulation  point  cq  satisfies  £(Y  —  co)  <  b. 

Connections  among  regular  measures  and  statistics  are  given  by  the  following  results,  which  extend 
the  Quadrangle  Theorem  in  [26]  to  the  broader  class  of  regular  measures  defined  here  and  also  include 
additional  characterizations  of  deviation  measures  and  statistics. 

2.2  Theorem  (risk  quadrangle  connections)  Regular  measures  of  risk,  regret,  deviation,  and  error  are 
related  as  follows: 

4The  extra  conditions,  on  the  behavior  of  certain  limits,  have  turned  out  to  be  superfluous  for  the  results  in  [26]. 
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(i)  The  relations 


K(Y)  =  V(Y)  +  E[Y]  and  V(Y)  =  1Z(Y)  -  E[Y ]  (4) 

give  a  one-to-one  correspondence  between  regular  measures  or  risk  1Z  and  regular  measures  of 
deviation  V.  Here,  TZ  is  positively  homogeneous  if  and  only  if  T>  is  positively  homogeneous. 
Moreover,  7Z  is  monotonic  if  and  only  ifT>(Y)  <  sup  Y  —  E[Y]  for  all  Y  £  £? . 

(ii)  The  relations 

V(Y)  =  £(Y)  +  E[Y]  and  £(Y)  =  V(Y)  -  E[Y]  (5) 

give  a  one-to-one  correspondence  between  regular  measures  of  regret  V  and  regular  measures  of 
error  £.  Here,  V  is  positively  homogeneous  if  and  only  if  £  is  positively  homogeneous.  Moreover, 
V  is  monotonic  if  and  only  if  £(Y)  <  |Y[Y]  |  for  all  Y  <  0. 

(Hi)  For  any  regular  measure  of  regret  V,  a  regular  measure  of  risk  is  obtained  by 

TZ(Y)  =  mm  {c0  +  V(Y  -  c0)}.  (6) 

If  V  is  positively  homogeneous,  then  7Z  is  positively  homogeneous.  If  V  is  monotonic,  then  1Z  is 
monotonic. 

(iv)  For  any  regular  measure  of  error  £,  a  regular  measure  of  deviation  is  obtained  by 

T>(Y)  =  min  £(Y  —  c0).  (7) 

c0Gi? 

If  £  is  positively  homogeneous,  thenV  is  positively  homogeneous.  If  £  satisfies  £(Y)  <  |i7[Y]|  for 
all  Y  <  0,  then  V  satisfies  V(Y)  <  sup  Y  —  E[Y]  for  all  Y  £  C2 .  Moreover,  V(Y  +  co)  =  H(Y)  for 
any  Y  £  C2  and  Co  £  M. 

(v)  For  corresponding  V  and  £  according  to  (ii)  and  Y  £  C2 ,  the  statistic 

S(Y )  =  argrnin  £ (Y  —  co)  =  argmin  l  co  +  V(Y  —  co)  f .  (8) 

c0eR  c0eR  *•  i 

It  is  a  nonempty  closed  bounded  interval  as  long  as  V(Y  —  co),  or  equivalently  £ (Y  —  co),  is  Unite 
for  some  Co  £  M.  Moreover,  S(Y  +  co)  =  5(Y)  +  {co}  for  any  Y  £  C2  and  co  £  IR,  and  5(0)  =  {0}. 

Proof.  Part  (i)  is  a  direct  consequence  of  the  regularity  of  7 Z  and  T>,  which  are  unchanged  from  the 
Quadrangle  Theorem  in  [26]. 

Part  (ii)  is  also  a  direct  consequence  of  the  regularity  of  V  and  £,  and  the  broadening,  compared  to 
[26],  of  the  class  of  regular  measures  does  not  require  modified  arguments. 

The  claims  in  Part  (iii)  about  positive  homogeneity  and  monotonicity  follow  easily  and  by  the  same 
arguments  as  those  leading  to  the  same  conclusions  in  [26].  However,  the  claims  that  the  infimum  in 
(6)  is  attained  and  indeed  produces  a  regular  measure  of  risk  require  a  new  argument.  Since 

c0  +  V(Y  -  c0)  =  £(Y  -  c0)  +  E[Y] 
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by  Part  (ii),  it  suffices  to  consider  minimization  of  £{Y  —  co).  First,  suppose  that  infCo  £{Y  —  co)  <  oo. 
Then,  there  exist  {cq}^^  and  such  that  ev  — >•  0  and 

£{Y  —  Cg)  <  inf  £(Y  —  cq)  +  ev  for  all  v. 
coG-R 

Applying  Lemma  2.1  with  Yv  =  Y,  bv  =  infCoej?  £(Y  —  co)  +  e",  and  6  =  inf Coen£(Y  —  co),  we  obtain 
that  {cq}^=1  is  bounded,  that  there  exists  a  scalar  Cq  and  a  subsequence  {coli/eA/-)  with  Cq  Cq,  and 
that 

£(Y-Cq)  <  inf  £{Y  —  c0). 

c0GR 

Consequently,  Cq  €  argnrinCo  £(Y  —  cq).  Second,  if  infCo  £(Y  —  co)  =  oo,  then  F?  =  argminCQ  £{Y  —  co). 
Thus,  the  infimum  in  (6)  is  attained  in  both  cases.  Next,  we  consider  closedness.  Suppose  that  Yu  — >•  Y, 
Cq  G  argmin CQ£(YV  —  co),  and  £(Y"  —  Cq)  <  b  G  1?  for  all  za  Hence,  1Z(YU)  —  E[YI/]  =  ^(Y1'  —  Cq)  <  b 
for  all  v.  An  application  of  Lemma  2.1  implies  that  there  exists  a  scalar  Cq  and  a  subsequence  {cq};,^; 
with  Cq  Cq,  and  £(Y  —  Cq)  <  b.  Consequently,  7 Z(Y)  —  E\Y\  =  minCo  £{Y  —  co)  <  £ (Y  —  Cq)  <  b, 
which  establishes  the  closedness  of  7£(-)  —  Fi[-].  The  expectation  functional  is  finite  and  continuous  on 
jC2  so  the  closedness  of  1Z  is  also  established.  Since  constancy  equivalence,  convexity,  and  averseness 
follow  trivially,  1Z  is  regular. 

Part  (iv)  follows  from  Parts  (i)-(iii),  with  the  exception  of  the  last  claim,  which  is  a  consequence  of 
the  fact  that  1Z(Y  +  co)  =  7 Z(Y)  +  co  for  regular  measures  of  risk. 

In  Part  (v),  the  alternative  expression  for  <S(Y)  follows  by  Part  (ii).  The  closedness  and  convexity 
of  <S(Y)  are  obvious  from  the  closedness  and  convexity  of  £.  Its  nonemptyness  is  a  consequence  of 
the  proof  of  Part  (ii).  An  application  of  Lemma  2.1,  with  Yv  =  Y,  bv  =  b  =  D(Y ),  and  Cq  G  <S(Y), 
establishes  the  boundedness  of  S.  The  calculus  rules  for  S  follow  trivially  from  the  definition  of  the 
statistic.  □ 

Regular  measures  of  risk,  regret,  error,  and  deviation  as  well  as  statistics  related  according  to 
Theorem  2.2  are  said  to  be  in  correspondence.  In  contexts  where  Y  is  a  monetary  loss,  then  the  scalar 
co  in  (6)  can  be  interpreted  as  the  investment  today  in  a  risk-free  asset  that  minimizes  the  displeasure 
associated  with  taking  responsibility  of  a  future  loss  Y.  Even  in  the  absence  of  a  risk-free  investment 
opportunity,  co  could  represent  a  certain  future  expenditure  that  allows  one  to  offset  the  loss  Y.  In  other 
contexts  where  one  aims  to  forecast  a  realization  of  Y,  co  G  <S(Y)  can  be  viewed  as  a  point  forecast  of 
that  realization  and  (6)  as  a  tradeoff  between  making  a  low  point  forecast  and  the  displeasure  derived 
from  making  an  “incorrect”  forecast.  We  provide  further  interpretations  in  the  next  section  as  we 
extend  the  notion  of  risk  measure. 

3  Residual  Measures  of  Risk 

A  measure  of  risk  applies  to  a  single  random  variable.  However,  in  many  contexts  the  scope  needs 
to  be  widened  by  also  looking  at  other  related  random  variables  that  hopefully  might  provide  insight, 
improve  prediction,  and  reduce  “risk.” 


In  this  section,  we  introduce  a  measure  of  residual  risk  that  extends  a  measure  of  risk  to  a  context 
involving  not  only  a  random  variable  Y,  still  of  primary  interest,  but  also  a  related  random  vector  X  = 
(X\,  ... ,Xn )  €  C2n  :=  C2  x  ...  x  £2.  The  definition  is  motivated  by  tradeoffs  experienced  by  forecasters 
and  investors,  but  as  we  shall  see  connections  with  regression,  surrogate  models,  and  distributional 
robustness  are  also  profound.  We  start  with  the  definition  and  motivations,  and  proceed  to  fundamental 
properties  and  connections  with  generalized  regression. 


3.1  Definition  and  Motivation 

As  an  extension  of  the  trade-off  formula  (6)  for  a  measure  of  risk,  we  adopt  the  following  definition  of 
a  measure  of  residual  risk. 


3.1  Definition  (measures  of  residual  risk)  For  given  X  £  C2  and  regular  measure  of  regret  V,  we 
define  the  associated  measure  of  residual  risk  (in  the  context  of  affine  approximating  functions)  to  be 
the  functional  TZ(-\X)  :  C2  — >•  [—00,00]  given  by 


K(Y\X)  :=  inf  {e[/(X)]  +  V(Y  -  f(X))  f  affine  }  for 


Y  £  C2. 


(9) 


The  quantity  1Z(Y\X)  is  the  residual  risk  ofY  with  respect  to  X  that  comes  from  V. 

We  observe  that  since  C2  is  a  linear  space,  Y  —  f(X)  £  C2  when  /  is  affine.  Consequently,  1Z(-\X)  is 
well  defined.  Two  examples  motivate  the  definition: 


Example  1:  Prediction.  Consider  a  situation  where  we  would  like  predict  the  peak  electricity 
demand  in  a  region  for  tomorrow.  Today  this  quantity  is  unknown  and  we  can  think  of  it  as  a  random 
variable  Y.  To  help  us  make  the  prediction,  temperature,  dew  point,  and  cloud  cover  forecast  for 
tomorrow  are  available,  possibly  for  different  hours  of  the  day.  Suppose  that  the  forecast  gives  the 
joint  probability  distribution  for  these  quantities  viewed  as  a  random  vector  X  and  that  our  (random) 
predication  of  tomorrow’s  electricity  demand  is  of  the  form  f(X),  with  /  an  affine  function.  Our  point 
forecast  is  E[f(X)\.  The  point  forecast  will  be  used  to  support  decisions  about  power  generation, 
where  higher  peak  demand  causes  additional  costs  and  challenges,  and  we  therefore  prefer  to  select  / 
such  that  E[f(X)\  is  as  low  as  possible.  Of  course,  we  need  to  balance  this  with  the  need  to  avoid 
underpredicting  the  demand.  Suppose  that  a  regular  measure  of  regret  V  quantifies  our  displeasure 
with  under-  and  overprediction.  Specifically,  V(Y  —  f(X))  is  the  regret  associated  with  /.  For  example, 
if  V  =  E[max{-,  0}]/ (1  —  a),  a  £  (0, 1),  then  we  are  indifferent  to  overpredictions  and  feel  increasing 
displeasure  from  successively  larger  underpredictions.  A  possible  approach  to  constructing  /  would  be 
to  use  historical  data  about  peak  demand,  temperature,  dew  point,  and  cloud  cover  to  find  an  affine 
function  /  such  that  both  E[f(X)\  and  V(Y  —  f(X))  are  low  when  (X,Y)  is  assumed  to  follow  the 
empirical  distribution  given  by  the  data.  This  bi-objective  optimization  problem  is  solved  in  (9)  through 
scalarization  with  equal  weights  between  the  objectives.  (Other  weights  simply  indicate  another  choice 
of  V.)  The  resulting  optimal  value  is  the  residual  risk  of  Y  with  respect  to  X  and  consists  of  the  point 
forecast  plus  a  “premium”  quantifying  our  displeasure  with  an  “incorrect”  forecast.  In  contrast,  if  /  is 
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restricted  to  the  constant  functions,  then  (9)  reduces  to  (6)  and  no  information  about  A  is  included. 
Specifically,  historical  data  about  peak  demand  is  used  to  find  a  constant  co  that  minimizes  (6),  i.e., 
makes  both  the  point  forecast  Co  and  the  regret  V(Y  —  Co)  low.  The  optimal  value  is  the  risk  of  Y, 
which  again  consists  of  a  point  forecast  plus  a  premium  quantifying  our  displeasure  with  “getting  it 
wrong.”  A  high  value  of  risk  or  residual  risk  therefore  implies  that  we  are  faced  with  an  unpleasant 
situation  where  the  forecast  for  the  peak  demand  as  well  as  our  regret  about  the  forecast  are  relatively 
high.  The  contributions  from  each  term  are  easily  determined  in  the  process  of  solving  (6)  and  (9). 
The  restriction  to  constant  functions  /  clearly  shows  that 

K(Y\X)  <U{Y). 

Consequently,  the  situation  can  only  improve  as  one  brings  in  information  about  temperature,  dew 
point,  and  cloud  cover  and  compute  the  forecast  f(X )  instead  of  cq.  Typically,  the  point  forecast 
E[f(X)\  will  be  lower  than  co  and  the  associate  regret  V(Y  —  /(A))  will  be  lower  than  V(Y  —  cq),  at 
least  the  sum  of  point  forecast  and  regret  will  not  worsen  when  additional  information  is  brought  in.  A 
quantification  of  the  improvement  is  the  difference  between  risk  and  residual  risk.  Of  course,  there  is 
nothing  special  about  electricity  demand  and  many  other  situations  can  be  viewed  similarly. 

It  is  possible  to  consider  alternatives  to  the  expectation-based  “point-forecast”  E[f(X)],  but  a 
discussion  of  that  subject  carries  us  beyond  the  scope  of  the  present  paper.  In  the  following,  we 
write  affine  functions  on  Mn  in  the  form  Co  +  (c,  •)  for  Co  G  M  and  c  G  Mn ,  where  the  inner  product 
(•,  •}  :  M'1  x  lRn  — >•  1R.  Consequently,  for  X  G  /(A)  =  cq  +  (c,X)  is  therefore  a  pointwise  equality 
between  random  variables,  i.e.,  co  +  (c,  X)  is  a  random  variable,  say,  Z  given  by  Z(u)  =  cq  +  (c,  A(w)), 
uj  G  fL  An  interpretation  of  residual  risk  arises  also  in  a  financial  context: 

Example  2:  Hedging  investor.  Consider  a  loss  Y,  given  in  present  money,  that  an  individual  faces 
at  a  future  point  in  time.  If  the  individual  is  passive,  i.e.,  does  not  consider  investment  options  that 
might  potentially  offset  a  loss,  she  might  simply  assess  this  loss  according  its  regret  V(Y),  where  V  is 
a  regular  measure  of  regret  that  quantifies  the  investor’s  displeasure  with  the  mix  of  possible  losses.  In 
view  of  the  earlier  comment  about  connections  between  regret  and  utility,  this  quantification  is  therefore 
quite  standard  and  often  used  when  comparing  various  alternative  losses  and  gains.  If  the  individual 
is  more  active  and  invests  co  G  M  in  a  risk-free  asset  now,  then  the  future  regret,  as  perceived  now, 
is  reduced  from  V(Y)  to  V(Y  —  co)  as  cq  will  be  available  at  the  future  point  in  time  to  offset  the 
loss  Y.  Though,  the  upfront  cost  cq  needs  also  to  be  considered,  and  the  goal  becomes  to  select  the 
risk-free  investment  cq  such  that  Co  +  V(Y  —  co)  is  minimized.  According  to  (6),  the  resulting  value  is 
the  corresponding  risk  7Z(Y)  and  every  Co  G  S(Y),  the  corresponding  statistic,  furnishes  the  amount  to 
be  invested  in  the  risk-free  asset.  To  further  mitigate  the  loss,  the  individual  might  consider  purchasing 
d  shares  in  a  stock  i  with  random  value  A,;,  in  present  terms,  at  the  future  point  in  time.  The  price 
of  each  share  is  pi  =  E\Xj\.  Let  i  =  1,2,  ...,n,  c  =  (ci, ...,  cn),  p  =  (pi,  ...,pn),  and  X  =  (Ai,...,An). 
Then,  since  Y  —  [co  +  (c,  A)]  is  the  future  hedged  loss  in  present  terms,  the  future  regret,  as  perceived 
now,  is  reduced  from  V(T)  to  V(Y  —  [cq  +  (c,  A}]).  Though,  the  upfront  cost  cq  +  (c,p)  needs  also  to  be 
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considered,  and  the  goal  becomes  to  select  the  risk-free  investment  cq  and  the  risky  investments  c  €  Mn 
that 

minimize  |c0  +  (c,p)  +  V(Y  -  [c0  +  (c,  X)])  j, 
which  according  to  (6)  is  equivalent  to  selecting  the  risky  investments  c  €  Mn  that 

minimize  j(c,p)  +  1Z(Y  —  (c,  X))  j. 

The  optimal  values  of  these  problems  are  the  residual  risk  1Z(Y\X).  The  possibly  nonoptimal  choices 
of  setting  Co  =  0  and/or  c  =  0  correspond  to  forfeiting  moderation  of  the  future  loss  through  risk-free 
and/or  risky  investments  and  give  the  values  7 Z(Y)  and  V{Y).  Consequently, 

K{Y\X)  <K{Y)  <  V(Y). 

The  differences  between  these  quantities  reflect  the  degree  of  benefit  an  investor  derives  by  departing 
from  the  passive  strategy  of  Co  =  0  and  c  =  0  to  various  degrees.  Of  course,  the  ability  to  reduce  risk  by 
taking  positions  in  the  stocks  is  determined  by  the  dependence  between  Y  and  X.  In  a  decision  making 
situation,  when  comparing  two  candidate  random  variables  Y  and  Y' .  an  individual’s  preference  of  one 
over  the  other  heavily  depends  on  whether  the  comparison  is  carried  out  at  the  level  of  regret,  i.e., 
V{Y)  versus  V(W),  as  in  the  case  of  traditional  expected  utility  theory,  at  the  level  of  risk,  i.e.,  1Z{Y) 
versus  1Z(Y'),  as  in  the  case  of  much  of  modern  risk  analysis  in  finance,  or  at  the  level  of  residual  risk 
1Z(Y\X)  versus  1Z(Y'\X).  The  latter  perspective  might  provide  a  more  comprehensive  picture  of  the 
“risk”  faced  by  the  decision  maker  as  it  accounts  for  the  opportunities  that  might  exist  to  offset  losses. 
The  focus  on  residual  risk  in  decision  making  is  related  to  the  extensive  literature  on  real  options  (see 
for  example  [8]  and  references  therein),  where  also  losses  and  gains  are  viewed  in  concert  with  other 
decisions. 

3.2  Basic  Properties 

We  continue  in  this  subsection  by  examining  the  properties  of  measures  of  residual  risk.  We  often 
require  the  nondegeneracy  of  the  auxiliary  random  vector  X,  which  is  defined  as  follows: 

3.2  Definition  (nondegeneracy)  We  will  say  that  an  n-dimensional  random  vector  X  =  (Xi,  X2, ...,  Xn) 
€  C?n  is  nondegenerate  if 

(c,  X)  is  a  constant  c  =  0  G  M" . 

We  note  that  nondegeneracy  is  equivalent  to  linear  independence  of  1,  Xi,  X2, ...,  Xn  as  elements  of  C? . 
For  I  £  we  also  define  the  subspace 

y(X)  :=  {Y  €  C2  I  Y  =  c0+  (c,x),c0e  m,c£  Mn}. 

Before  giving  the  main  properties,  we  establish  the  following  technical  result  which  covers  and 
extends  Lemma  2.1. 
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3.3  Lemma  For  a  regular  measure  of  error  £  and  sequence  {(eg,  cu)}'^=1,  with  Cg  G  M  and  cu  G  Mn 
for  all  v,  the  following  holds: 

If  Yu  G  C2 ,  Xv  G  C?n,  and  bu  G  IR  converge  to  Y  G  C2 ,  X  G  £2,  and  b  G  1R,  respectively,  X 
is  nondegenerate,  and  £{YV  —  [eg  +  (cu,Xu)])  <  bv  for  all  v,  then  {(eg, cu)}'^=1  is  hounded  and  any 
accumulation  point  (co,  c)  satisfies  £(Y  —  [co  +  (c,  X)])  <  b. 

Proof.  For  the  sake  of  a  contradiction  suppose  that  {(eg,  cl')}%L1  is  not  bounded.  Then,  there  ex¬ 
ists  a  subsequence  {(eg, cl')}l/ej^  such  that  || (cq , c^) ||  >  1  for  all  v  G  J\f,  ||(cg,c1/)||  oo,  and 

(eg,  cv)/\\  (eg,  c")||  (ao,a)  /  0,  with  ao  G  JR  and  a  G  Mn .  Let  Xu  =  l/||(cg,c")||.  Since  £  is 

convex  and  £ (0)  =  0,  we  have  that 

£(X Y)  <  X £(Y)  for  Y  G  C2  and  A  G  [0, 1], 

Consequently,  for  v  G  K f, 

A vbv  >  A U£(YV  -  [c£  +  (cv,  Xv)])  >  £{X UYU  -  [X"cu0  +  (A "cv,  Xv)\)  >  0. 

Since  Xu  0,  A ubu  0  and  A VYU  —  [A^Cg  +  (A vcv ,XV)]  —  [ao  +  ( a,X )].  These  facts  together 

with  the  closedness  of  £  imply  that  £(— [ao  +  (a,  X)])  =  0  and  therefore  also  that  ao  +  ( a,X )  =  0. 
Since  X  is  nondegenerate,  this  implies  that  a  =  0.  Then,  however,  ao  =  0,  and  (ao,a)  =  0,  which  is  a 
contradiction.  Thus,  {(eg,  c*')}£T1  is  bounded.  The  inequality  £{Y  —  [co  +  (c, X)])  <  b  follows  directly 
from  the  closedness  of  £.  □ 

Fundamental  properties  of  measures  of  residual  risk  are  given  next. 


3.4  Theorem  (residual-risk  properties)  For  given  X  G  £2  and  regular  measures  of  regret  V,  risk  IZ, 
deviation  T>,  and  error  £  in  correspondence,  the  following  facts  about  the  associated  measure  of  residual 
risk  IZ(- \X)  hold: 


(i)  TZ(Y\X)  satisfies  the  alternative  formulae 


K(Y \X) 


mfn{(c,E[X])  +K(Y  -  (c,X))} 
E[Y 1  +  inf  V{Y  —  (c,  X)) 

ceRn 

E[Y}+  inf  £(Y-[co  +  (c,X)]). 

nr.CZ  F?  nCZ  F?n 


(ii)  E[Y]  <  K(Y\X)  <  K(Y)  <  V(T)  for  all  Y  G  C2 . 

(Hi)  1Z(-\X)  is  convex  and  satisfies  the  constant  equivalence  property. 

(iv)  IfV  is  positively  homogeneous,  then  TZ(-\X)  is  positively  homogeneous.  IfV  is  monotonic,  then 
IZ(-\X)  is  monotonic. 

(v)  If  X  is  a  constant  random  vector,  then  1Z(Y\X)  =  IZ(Y). 
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(vi)  If  X  is  nondegenerate,  then  1Z(-\X)  is  closed  and  the  inhmum  in  its  definition  as  well  as  in  the 
alternative  formulae  in  (i)  is  attained. 

(vii)  H(Y\X)  =  E[Y]  ifY  £  y(X),  whereas  K(Y \X)  >  E[Y ]  ifY  0  y(X)  and  X  is  nondegenerate. 

Proof.  Part  (i)  is  a  direct  consequence  of  the  relationships  between  corresponding  measures  given  in 
Theorem  2.2.  The  first  inequality  in  Part  (ii)  is  a  consequence  of  the  fact  that  V  >  E[-]  on  C2 .  The 
second  inequality  follows  by  selecting  the  possibly  nonoptimal  solution  c  =  0  in  the  first  alternative 
formula  and  the  third  inequality  by  selecting  Co  =  0  and  c  =  0  in  the  definition 

K(Y\X)  =  cQg  mfgfln  {c0  +  (c,  E[X})  +  V(Y  -  [c0  +  (c,  X)})}. 

Part  (v)  is  obtained  from  the  first  alternative  formula  in  Part  (i)  and  the  fact  that  7 Z(Y  +  k)  = 
IZ(Y)  +  k  for  any  k  £  IR. 

For  Part  (iii),  convexity  follows  since  the  function  (co,  c,  Y)  >-)•  cq  +  (c,  E[X])  +  V(Y  —  [cq  +  (c,  X)]) 
is  convex  on  IR  x  JRn  x  £2;  see  for  example  [23,  Theorem  1].  Constant  equivalence  is  a  consequence  of 
Part  (ii)  and  the  fact  that  Co  =  E[Y]  <  IZ(Y\X)  <  IZ(Y)  =  cq  when  Y  =  cq. 

Part  (iv)  follows  trivially  from  the  definitions  of  positive  homogeneity  and  monotonicity  and  Part 
(v)  is  likewise  straightforwardly  obtained. 

Next  we  address  Part  (vi).  First,  we  consider  the  minimization  of  £(Y  —  [co  +  (c,  X)]).  Suppose 
that  infCOjC£(y  —  [co  +  (c,  X)])  <  oo.  Then,  there  exist  {(eg,  c*')}£T1,  with  eg  £  IR  and  c"  £  M" ,  as  well 
as  {el/},^L1  such  that  ev  0  and 

£(Y  -  [eg  +  {cv,  X)])  <  inf  £(Y  -  [cq  +  (c,  X)])  +  for  all  v. 

c0eR,c€Rn 

Applying  Lemma  3.3  with  Yu  =  Y,  Xv  =  X,  bv  =  miCQ)C£(Y  —  [co  +  (c,  X)])  +  £v ,  and  b  =  infCOiC  £(Y  — 
[co  +  (c, X)]),  we  obtain  that  {(eg, cI/)}[)T1  is  bounded,  that  there  exist  eg  €  IR,  c*  €  IRn ,  and  a 
subsequence  {(eg,  cu)}u^,  with  (cg,c")  — (cg,c*),  and  that 

£(Y  -  [eg  +  (c*,X)])  <  inf  £(Y  -  [c0  +  (c,  X)]). 

c0eR,ceRn 

Consequently,  (cg,c*)  £  argminCoCf(y  —  [co  +  (c, X)]).  If  infC0)C£(y  —  [cq  +  (c, X)])  =  oo,  then 
Rn+1  =  argmin CQC£(Y  —  [co  +  (c,  X)]).  Thus,  the  error  minimization  in  Part  (i)  is  attained  when  X 
is  nondegenerate.  In  view  of  (5),  the  inhmum  in  the  definition  of  residual  risk  is  also  attained.  A 
nearly  identical  argument  shows  that  the  inhma  in  the  alternative  formulae  in  (i)  are  also  attained. 
Second,  we  consider  closedness.  Suppose  that  Yu  — >•  Y,  (eg, (A)  £  argmin CoC£(Yv  —  [cq  +  (c, X)]),  and 
£{YU  -  [eg  +  (c,  X)])  <  b  £  IR  for  all  v.  Hence,  U{YV\X)  -  E[YU]  =  £{YU  -  [eg  +  (c",X)])  <  b  for 
all  v.  An  application  of  Lemma  3.3  implies  that  there  exist  eg  £  IR,  c*  £  IR," ,  and  a  subsequence 
{(eg,  c*')}i,&v')  with  {cq,cv)  (cg,c*),  and  £{Y  —  [eg  +  (c*, X)])  <  b.  Consequently,  TZ{Y\X)  —  E[Y]  = 
minC0;Cf(y— [co  +  (c,  X)])  <  £(Y  —  [eg  +  (c*,  X)])  <  b,  which  establishes  the  closedness  of  7£(-|X) -E[-\. 
The  expectation  functional  is  finite  and  continuous  on  C2  so  the  closedness  of  IZ(-\X)  is  also  established. 


13 


Finally,  we  consider  Part  (vii).  Suppose  that  Y  G  y(X).  Then,  there  exists  cq  £  JR  and  c  G  lRn 
such  that  Y  =  co  +  (c,  X ).  In  view  of  Parts  (i)  and  (ii) 


E[Y]  <  1Z(Y\X) 


mfn{(c,E[X])+K(Y-(c,X))} 


<  (c,E[X])+K(Y-{c,X)) 
=  {c,E[X])+n(co) 


=  c0  +  (c,E[X})  =  E[Y], 


which  establishes  the  first  claim.  Suppose  that  Y  0  y(X).  Then,  Y  —  (c,X)  /  cq  for  any  co  G  7R 
and  c  G  Mn.  Consequently,  Y  —  (c,X)  is  not  a  constant  for  any  c  G  Mn,  which  by  the  averseness  of 
1Z  implies  that  7 Z(Y  —  (c,X))  >  E[Y  —  (c,  X)].  If  X  is  nondegenerate,  then  by  Part  (vi)  there  exists 
c  G  Rn  such  that 

1Z(Y\X)  =  chT  {(c,E[X])  +  7Z(Y  -  (c,X))} 

=  (c,E[X])+TZ(Y-(c,X)) 

>  (c,E[X])  +  E[Y-(c,X)}  =  E[Y}, 


which  completes  the  proof.  □ 

We  see  from  Theorem  3.4(i)  that  a  measure  of  residual  risk  decomposes  into  an  “irreducible”  value 
E[Y]  and  a  quantification  of  “nonzeroness”  by  an  error  measure  of  the  difference  between  Y  and  an 
affine  model  in  terms  of  X,  that  is  reduced  as  much  as  possible  by  choosing  co,c  optimally. 

A  fundamental  consequence  of  Theorem  3.4  is  that  for  a  nondegenerate  X , 

a  measure  of  residual  risk  is  also  a  closed,  convex,  and  constancy  equivalent  measure  of  risk. 

The  constructed  risk  measure  is  positively  homogeneous  if  the  underlying  risk  measure  is  positively 
homogeneous.  Monotonicity  is  likewise  inherited.  When  X  is  nondegenerate,  it  is  also  averse  outside 

y(x). 

Further  insight  is  revealed  by  the  following  trivial  but  informative  example. 


Example  3:  Normal  random  variables.  Suppose  that  X  and  Y  are  normal  random  variables  with 
mean  values  \ix  and  //y ,  respectively,  and  standard  deviations  ax  >  0  and  ay,  respectively.  We  here 
temporarily  let  X  be  scalar  valued.  Let  p  G  [—1,1]  be  the  correlation  coefficient  between  X  and  Y, 
and  GY  (a)  be  the  a-quantile  of  Y.  We  recall  that  for  a  €  [0, 1)  the  superquantile/CVaR  risk  measure 
7 Z(Y)  =  Gy  (/3)d/3/(l  —  a);  see  Appendix.  For  this  risk  measure,  it  is  straightforward  to  show  that 
the  residual  risk  of  Y  with  respect  to  X  takes  the  simple  form 

1  —  a 

where  cp  and  are  the  probability  density  and  cumulative  distribution  functions  of  a  standard  normal 
random  variable,  respectively.  The  value  of  c  that  attains  the  minimum  in  item  (i)  of  Theorem  3.4  is 
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pay  /crx-  We  note  that  for  p  =  ±1  the  residual  risk  is  reduced  to  its  minimum  possible  level  of  py. 
The  other  extreme  is  attained  for  p  =  0,  when  TZ(Y\X )  =  1Z(Y).  In  view  of  the  previously  discussed 
hedging  investor,  we  note  that  for  perfectly  correlated  investment  possibilities,  “risk”  can  be  eliminated. 
The  sign  of  the  correlation  coefficient  is  immaterial  as  both  short  and  long  positions  are  allowed.  In 
the  case  of  independent  assets,  no  hedging  possibility  exists  and  the  investor  faces  the  inherent  risk  in  Y. 

We  next  examine  the  case  when  Y  is  statistically  independent  of  X  in  the  general  case.  We  start 
with  terminology. 

3.5  Definition  (representation  of  risk  identifiers)  A  risk  identifier  Q *  at  Y  £  C2  for  a  regular  measure 
of  risk  will  be  called  representable  if  there  exists  a  Borel-measurable  function  hy  :  1R  — >•  M,  possibly 
depending  on  Y,  such  that 

Q 5  (c o)  =  hy(Y(u))  for  a.e.  u  £  fh 

For  first-order  and  second-order  superquantile/C  VaR  risk  measures  there  exist  representable  risk  iden¬ 
tifiers  for  all  7  ££2;  see  the  Appendix. 

3.6  Proposition  Suppose  that  Z,  Y  £  L '?  are  statistically  independent.  If  Q5  is  a  representable  risk 
identifier  at  Y  for  a  regular  measure  of  risk,  then  Q *  and  Z  are  statistically  independent. 

Proof.  Since  Q'1  is  a  representable  risk  identifier,  there  exists  a  hy  :  1R  — »•  IR,  Borel-measurable,  such 
that  for  almost  every  cu  £  fl,  hy(Y (w))  =  Q 1  (w).  For  Borel  sets  C,  D  C  M, 

P{<u  £  n  I  QY (w)  €  C,  Z(u)  £  D}  =  P{cu  £  0  |  hy{Y(uj))  €  C,  Z[u)  £  D} 

=  P{u;  £  Q  |  Y(u)  £  hy\C),  Z(u)  £  D} 

=  P{u;  £  Q  |  Y(u)  €  hy\C)}F{uj  £  Q  \  Z(u)  £  D} 

=  P{ca  £  0  |  QY{ui)  €  C}P{w  £  Q  \  Z(u)  £  D}, 

where  the  third  equality  follows  from  the  fact  that  hy 1  (C)  is  a  Borel  set  and  Z  and  Y  are  independent. 
Consequently,  Qy  and  Z  are  independent.  □ 

3.7  Theorem  (measures  of  residual  risk  under  independence)  Suppose  that  Y  £  C2  and  X  £  C2  are 
statistically  independent,  and  TZ  is  a  regular  measure  of  risk  with  a  representable  risk  identifier  at  Y 
and  Y  £  int(dom7T).  Then, 

TZ(Y \X)  =  n(Y). 

Proof.  By  Theorem  3.4,  TZ(Y\X)  =  infc£ij™  (p(c),  where  we  define  ip(c)  =  (c,E[X]}  +  7 Z(Y  —  (c,X)). 
Hence,  it  suffices  to  show  that  c  =  0  is  an  optimal  solution  of  this  problem.  The  assumption  that 
Y  £  int(dom7vl)  ensures  that  dlZ(Y)  is  nonempty  and  that  the  subdifferential  formula  (see  for  example 
[23,  Theorem  19]) 

d<p{c)  =  [e\X]  -  E[QX]  [  Q  £  dU{Y  -  (c,  A))} 
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holds.  Consequently,  by  convexity  of  99,  c  =  0  minimizes  ip  if  and  only  if  0  £  dtp(0).  Since  there  exists 
a  risk  identifier  Q  £  dlZ{Y)  that  is  independent  of  X  by  Proposition  3.6,  the  conclusion  follows  by  the 
fact  that  E[Q\  =  1  for  every  Q  £  Q  and  E[QX]  =  E[Q\E[X]  =  E[ X]  for  such  an  independent  Q.  □ 

3.3  Residual  Statistics  and  Regression 

In  the  same  manner  as  a  statistic  S(Y)  furnishes  optimal  solutions  in  the  trade-off  formulae  (6)  and 
(7),  the  extended  notion  of  residual  statistic  furnishes  optimal  solutions  in  (9): 

3.8  Definition  (residual  statistic)  For  given  X  £  and  a  regular  measure  of  regret  V,  we  define  an 
associated  residual  statistic  to  be  the  subset  of  ]Rn+l  given  by 

S°(Y\X)  :=  argmin  |c0  +  (c,  E[X})  +  V{Y  -  [c0  +  (c,  X)])j  for  Y  £  C2 . 

c0eR,ceRn  1  > 

If  in  addition  1 Z  is  a  corresponding  measure  of  risk,  then  an  associated  partial  residual  statistic  is  the 
subset  of  Mn  given  by 

S(Y|X)  :=  argmin  { (c,  E[X\)  +  K(Y  -  (c,X))|  for  Y  £  C2 . 

The  motivation  for  the  terminology  “partial  residual  statistic”  becomes  apparent  from  the  following 
properties. 

3.9  Theorem  (residual  statistic  properties)  Suppose  that  X  £  C2n  and  V,  IZ,  £,  and  V,  are  corre¬ 
sponding  regular  measures  of  regret,  risk,  error,  and  deviation,  respectively,  with  statistic  S.  Then,  the 
residual  statistic  5°(-|AT)  and  partial  residual  statistic  risk  5(-|A)  satisfy  for  Y  £  C2: 

(i)  5°(Y|A)  and  5(Y|Y)  are  closed  and  convex,  and,  if  X  is  nondegenerate,  then  they  are  also 
nonempty. 

(ii)  5°(Y|Y)  and  5(Y|X)  are  compact  when  IZ(Y\X)  <  00  and  X  is  nondegenerate. 

(in)  If  c  £  5(Y|X),  then  (co,c)  £  5°(Y|Y)  for  cq  £  S(Y  —  (c,X)),  whereas  if  (co,c)  £  5°(Y|X),  then 
c0  £  S(Y  -  (c,  X))  and  c  £  <S(Y|X). 

(iv)  The  following  alternative  formulae  hold: 

5°(Y|X)  =  argmin  £(Y  —  [co  +  (c,  Y)])  and  5(Y|X)  =  argmin X>(Y  —  (c,X)). 
c0eR,cGRn  c£Rn 

Proof.  For  Part  (i),  closedness  and  convexity  are  consequences  of  the  fact  that  both  sets  are  optimal 
solution  sets  of  the  minimization  of  closed  and  convex  functions.  The  nonemptyness  follows  from 
Theorem  3.4(vi).  For  Part  (ii),  suppose  that  the  sequence  {(cq,  cu)}^=1  satisfies  (cq,cu)  £  5°(Y|Y)  for 
all  v.  Then,  an  application  of  Lemma  3.3,  with  Yu  =  Y,  Xv  =  X,  bv  =  b  =  infCQ)C £ (Y  —  [co  +  (c,  X)]), 
implies  that  {(cq,  cu)}^=1  is  bounded  and  5°(Y|Y)  is  therefore  compact.  A  nearly  identical  argument 
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leads  to  the  compactness  of  <S(T|X).  Part  (iii)  follows  trivially.  Part  (iv)  is  a  consequence  of  Theorem 
3.4(i).  □ 

Generalized  linear  regression  constructs  a  model  cq  +  (c,  X)  of  Y  by  solving  the  regression  problem 

[c0  +  (c,X)]) 

cq£R,c£R 

with  respect  to  the  regression  coefficients  Co  and  c.  The  choice  of  error  measure  £  =  ||  •  H2  recovers 
the  classical  least-squares  regression  technique,  but  numerous  other  choices  exist.  See  for  example 
[22,  26,  21],  the  Appendix,  and  the  subsequent  development.  It  is  clear  from  Theorem  3.9(iii)  that 
the  regression  coefficients  can  be  obtained  alternatively  by  first  computing  a  “slope”  c  €  5(y|A)  and 
then  setting  the  intercept  Co  €  S(Y  —  (c,X)),  with  potential  computational  advantages.  Moreover, 
Theorem  3.9  shows  that  points  furnishing  the  minimum  value  in  the  definition  of  residual  risk  under 
regret  measure  V  coincide  with  the  regression  coefficients  obtained  in  the  regression  problem  using  the 
corresponding  error  measure  £  =  V  —  E[-].  Further  connections  between  residual  risk  and  regression 
are  highlighted  in  the  next  example. 


Example  4:  Entropic  risk.  In  expected  utility  theory,  the  utility  U(W)  =  E[  1  —  exp(— IT)]  of  “gain” 
IT  is  a  well-known  form,  which  in  our  setting,  focusing  on  losses  instead  of  gains,  translates  into  the 
regret  V(Y)  =  E[exp(Y)  —  1]  of  “loss”  Y  =  —IT.  The  measure  of  regret  V  is  regular  and  generates  the 
corresponding  measure  of  risk  7 Z(Y)  =  logEfexpT]  and  measure  of  error  £{Y)  =  E[exp(T)  —  Y  —  1] 
by  an  application  of  Theorem  2.2.  In  this  case,  the  corresponding  statistic  S  coincides  with  1Z,  which 
implies  that  for  (cq,c)  €  <S°(T|A),  we  have 

K(Y\X)  =  {c,E[X})  +1l(Y  -  (c,X))  and  c0  €  S(Y  -  (c,  X))  =  {7l(Y  -  (c,  X))}. 


Hence, 


7Z(Y\X)  =  c0  +  (c,E[Xj) 


and  the  residual  risk  of  Y  coincides  with  the  value  of  the  regression  function  co  +  (c,  •)  at  E[X\  when 
that  function  is  obtained  by  minimizing  the  corresponding  error  measure  £ (Y)  =  E[exp(T)  —  Y  —  1]. 


The  residual  risk  is  directly  tied  to  the  “fit”  in  the  regression  as  developed  next.  In  least-squares 
regression,  the  coefficient  of  determination  for  the  model  cq  +  (c,  •)  is  given  by 


i*L(co, c)  —  1  ~ 


E[(T-[c0  +  (c,A)])2] 
E[(Y-E[Y ])2' 


(10) 


and  provides  a  means  for  assessing  the  goodness-of-Ht.  Although  the  coefficient  cannot  be  relied  on 
exclusively,  it  provides  an  indication  of  the  goodness  of  fit  that  is  easily  extended  to  the  context  of 
generalized  regression  using  the  insight  of  risk  quadrangles.  From  Example  1’  in  [26],  we  know  that  the 
numerator  in  (10)  is  the  mean-squared  error  measure  applied  to  Y  —  [co  +  (c,  A)]  and  the  denominator 
is  the  “classical”  deviation  measure  T>(Y)  =  E[(Y  —  E[T])2].  Moreover,  the  minimization  of  that 
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mean-squared  error  of  Y  —  [co  +  (c,  X)]  results  in  the  least-squares  regression  coefficients.  According 
to  [26],  these  error  and  deviation  measures  are  parts  of  a  risk  quadrangle  and  yield  the  expectation  as 
its  statistic.  The  Appendix  provides  further  details  for  the  essentially  equivalent  case  involving  square- 
roots  of  the  above  quantities.  These  observations  motivate  the  following  definition  of  a  generalized 
coefficient  of  determination  for  regression  with  error  measure  £  (see  [21,  17]  for  the  cases  of  quantile 
and  superquantile  regression). 


3.10  Definition  (generalized  coefficients  of  determination)  For  a  regular  measure  of  error  and  corre¬ 
sponding  measure  of  deviation,  the  generalized  coefficient  of  determination  is  given  by5 


git  „  S(Y-[co  +  (c,X)]) 


R-{cq,c)  :=  1  - 


V(Y) 


for  cq  €  1R,  c  €  lRn 


and  the  fitted  coefficient  of  determination  is  given  by 


R2  :=  1  - 


inf. 


c0eR,cei?’ 


£(Y-[co  +  (c,X)}) 


V{Y) 


(11) 


As  in  the  classical  case,  higher  values  of  R2  are  better,  at  least  in  some  sense.  Indeed,  a  regression 
problem  aims  to  minimize  the  error  of  Y  —  [co  +  (c,  A)]  by  wisely  selecting  the  regression  coefficients 
(co,  c)  and  thereby  also  maximizes  R? .  The  error  is  normalized  with  the  overall  “nonconstancy”  in  Y  as 
measured  by  its  deviation  measure  to  more  easily  allow  for  comparison  of  coefficients  of  determination 
across  data  sets. 


3.11  Proposition  (properties  of  generalized  coefficients  of  determination)  The  generalized  and  fitted 
coefficients  of  determination  satisfy 

R2(co,  c)  <  R2  <1  for  co  G  fR  and  c  G  Mn;  and  R2  >  0. 

Proof.  The  upper  bound  follows  directly  from  the  nonnegativity  of  error  and  deviation  measures.  Due 
to  the  minimization  in  the  fitted  coefficient  of  determination,  R2(cq,c)  <  R2.  The  lower  bound  is  a 
consequence  of  the  fact  that 

inf  £(Y  -  [c0  +  (c,  A)])  <  inf :  S(Y  -  c0)  =  V(Y), 
c0eR,cei?n  c0eR 

which  completes  the  proof.  □ 

The  connection  with  residual  risk  is  given  next. 

3.12  Theorem  (residual  risk  in  terms  of  coefficient  of  determination)  The  measure  of  residual  risk 
associated  with  regular  measures  of  error  £  and  deviation  V  satisfies 

1l(Y\X)  =  E[Y ]  +  T>(Y)(l  -  R2), 

where  R2  is  the  associated  fitted  coefficient  of  determination  given  by  (11). 

5Hcre,  oo/oo  and  0/0  are  interpreted  as  1. 
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Proof.  Direction  application  of  (11)  and  Theorem  3.4(i)  yield  the  conclusion.  □ 

We  recall  from  Theorem  2.2(i)  that  TZ(Y)  =  E\Y]+T>(Y).  Theorem  3.12  shows  that  the  residual  risk 
is  less  than  that  quantity  by  an  amount  related  to  the  goodness-of-fit  of  the  regression  curve  obtained 
by  minimizing  the  corresponding  error  measure. 


4  Surrogate  Estimation 

As  eluded  to  in  Section  1 ,  applications  might  demand  an  approximation  of  a  random  variable  Y  in  terms 
of  a  better  known  random  vector  X.  Restricting  the  attention  to  affine  functions  f{X)  =  co  +  (c,X) 
of  X ,  the  goal  becomes  how  to  best  select  co  G  JR  and  c  €  JRn  such  that  co  +  (c,  X)  is  a  reasonable 
surrogate  estimate  of  Y .  Of  course,  this  task  is  closely  related  to  the  regression  problem  of  the  previous 
section.  Here,  we  focus  on  the  ability  of  surrogate  estimates  to  generate  approximations  of  risk.  In  this 
section,  we  develop  “best”  risk-tuned  surrogate  estimates  and  show  how  they  are  intimately  connected 
with  measures  of  residual  risk.  We  also  discuss  surrogate  estimation  in  the  context  of  incomplete 
information,  often  the  setting  of  primary  interest  in  practice. 

4.1  Risk  Tuning 

Suppose  that  JZ  is  a  regular  measure  of  risk  and  Y  G  C?  is  a  random  variable  to  be  approximated. 
Then,  for  a  random  vector  X  G  C?n  and  c  €  JR'1, 

K(Y)  =  K(E[Y ]  +  (c,X  —  E[X\)  +  Y-  E[Y]  —  (c,X  —  E[X])^j 

<  A n(\{E[Y]  +  (c,X-  £?[*]»)  +  (1  -  X^-^-jiY  -  E[Y]  -(c,X-  £[*]»), 

for  all  A  G  (0, 1)  because  convexity  holds.  Consequently,  an  upper  bound  on  the  one-sided  difference 
between  risk  1Z(Y)  and  the  risk  of  the  (scaled)  surrogate  estimate  co  +  (c,  X),  with  co  =  E[Y  —  ( c ,  A)], 
is  given  by 

ft(T)-Aft(*(c0  +  (c,A»)  <  (c,E[X])  +  (l-X)n(KY^(Y-(c,X)))-E[Y}. 

The  upper  bounding  right-hand  side  is  nonnegative  because  7 Z(Z)  >  E[Z]  for  any  Z  G  C2  and  is 
minimized  by  selecting  c  G  «S(Y/(1  —  A)|X/(1  —  A)).  (We  recall  that  <S(Y| X)  is  nonempty  by  Theorem  3.9 
when  X  is  nondegenerate.)  The  minimum  value  is  the  (scaled)  residual  risk  (1— X)1Z(Y/(1— A)|A/(1— A)) 
minus  E[Y],  Again,  in  view  of  Theorem  3.9,  such  c  is  achieved  by  carrying  out  generalized  regression, 
minimizing  the  corresponding  measure  of  error.  This  insight  proves  the  next  result,  which,  in  part,  is 
also  implicit  in  [22]  where  no  connection  with  residual  risk  is  revealed  and  positively  homogeneity  is 
assumed. 

4.1  Theorem  (surrogate  estimation)  For  a  given  X  G  jC'^  ,  suppose  that  TZ  is  a  regular  measure  of 
risk,  and  TZ(-\X)  and  <S(-|A)  are  the  associated  measure  of  residual  risk  and  partial  residual  statistic, 
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respectively.  For  any  A  £  (0, 1),  let  Yx  =  Y/(  1  —  A)  and  X\  =  X/ (1  —  A).  Then,  the  surrogate  estimate 
co  +  (c,  X)  of  Y  given  by 

c  £  S(TA|XA)  and  c0  =  E[Y  -  (c,  X)] 

satisfies 

K(Y)-\k(j(c0  +  {c,X))^  <  (1  -  \)K(Yx\Xx)  -  E[Y}.  (12) 

The  surrogate  estimate  cq  +  (c,  X) ,  with  co  =  (1  —  A )TZ(YX  —  (c,  XA}),  satisfies 

K(Y)<\K(j{Eo-(c,X))y 

Proof.  The  first  result  follows  by  the  arguments  prior  to  the  theorem.  The  second  result  is  a  conse¬ 
quence  of  moving  the  right-hand  side  term  of  (12)  to  the  left-hand  side  and  incorporating  that  term 
into  the  constant  Co,  which  is  permitted  because  TZ(Y  +  k)  =  TZ(Y)  +  k  for  Y  £  £2  and  k  £  TR.  □ 

The  positive  homogeneity  of  TZ  allows  us  to  simplify  the  above  statements. 

4.2  Corollary  For  a  given  X  £  C?n ,  suppose  that  TZ  is  a  positively  homogeneous  regular  measure  of 
risk,  and  TZ{- \X)  and  5(-|X)  are  the  associated  measure  of  residual  risk  and  partial  residual  statistic, 
respectively.  Then,  the  surrogate  estimate  co  +  (c,  X)  ofY  given  by 

c  £  <S(T|X)  and  c0  =  E[Y  -  (c,  X)] 

satisfies 

TZ(Y)  -  TZ(c0  +  (c,  X))  <  TZ(Y |X)  -  E[Y}. 

The  surrogate  estimate  cq  +  (c,  X) ,  with  cq  =  TZ{Y  —  (c,  X)),  satisfies 

TZ(Y)<TZ(E0-{c,X)). 

Example  5.  Risk-tuned  Gaussian  approximation.  Theorem  4.1  supports  the  construction  of 
risk-tuned  Gaussian  approximations  of  a  random  variable  Y ,  which  can  be  achieved  by  considering  a 
Gaussian  random  vector  X.  Observations  of  {Y,X)  could  be  the  basis  for  generalized  regression  with  a 
measure  of  error  corresponding  to  TZ,  which  then  would  establish  c  and  subsequent  cq.  Then,  co  +  (c,  X) 
is  a  risk-tuned  Gaussian  approximation  of  Y.  If  TZ  is  positively  homogeneous,  then  TZ{cq  +  (c,X))  is 
an  approximate  upper  bound  on  TZ(Y),  with  the  imprecision  following  from  the  passing  to  an  empirical 
measure  generated  by  the  observations  of  (Y,  X).  We  next  discuss  such  approximations  in  further  detail. 

4.2  Approximate  Random  Variables 

Surrogate  estimation  and  generalized  regression  are  often  carried  out  in  the  context  of  incomplete  (distri¬ 
butional)  information  about  the  underlying  random  variables.  A  justification  for  utilizing  approximate 
random  variables  is  provided  by  the  next  two  results.  The  first  result  establishes  consistency  in  gener¬ 
alized  regression  and  the  second  proves  that  surrogate  estimates  using  approximate  random  variables 
remain  conservative  in  the  limit  as  the  approximation  vanishes.  We  refer  to  [30]  for  consistency  of 
sample-average  approximations  in  risk  minimization  problems. 
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4.3  Theorem  (consistency  of  residual  statistic  and  regression)  Suppose  that  V  is  a  Unite  regular  mea¬ 
sure  of  regret  and  that  Yu  G  C2,  Xu  =  ( X ...,  X”)  G  C2X,  v  =  0,1, 2, satisfy 

Yu  — >•  Y°  and  X\  — »•  X[-  for  all  i,  as  u  -A  oo. 

If  S'°(-|A11')  are  the  associated  residual  statistics,  then 6 

lim  sup  5°  (Yu  |  Xv)  C  S0(Y°|A0). 

Proof.  Let  co  €  IR  and  c  €  Since  V  is  finite,  closed,  and  convex,  it  is  continuous.  Moreover, 

E[XV]  ->  £[AC0].  For  z/  =  0, 1,  2, ...,  let  p"  :  Rn+1  ->  M  be  defined  by 

<Pv(co,  c)  =  co  +  (c,  £[*"]>  +  V(y"  -  [co  +  (c,  Xv)\). 

Then,  as  u  — >•  oo,  (pu(co,c)  — >•  <^°(co,c).  Thus,  the  finite  and  convex  functions  g>v  converge  pointwise 
on  Rn+ 1  to  and  therefore  they  also  epiconverge  to  the  same  limit  by  [27,  Theorem  7.17].  The 
conclusion  is  then  a  consequence  of  [27,  Theorem  7.31].  □ 

The  theorem  establishes  that  solutions  of  approximate  generalized  regression  problems  are  indeed 
approximations  of  solutions  of  the  actual  regression  problem.  We  observe  that  if  (YV,XV)  converges  in 
distribution  to  (Y°,Al°)  as  well  as  E[(YU)2]  — »•  E[(YQ)2]  and  E[{X”)2\  -A  E[(X °)2]  for  all  i,  then  the 
/^-convergence  assumption  of  the  theorem  holds. 

Approximations  in  surrogate  estimation  are  addressed  next. 

4.4  Theorem  (surrogate  estimation  under  approximations)  Suppose  that  1 Z  is  a  regular  measure  of 
risk  and  1Z(-\X)  and  <S(-|Al),  X  G  C2n,  are  the  associated  measure  of  residual  risk  and  partial  residual 
statistic.  Let  Yu  G  C2 ,  Xv  =  (X\ ,  ...,X%)  €  £2,  v  =  0, 1,  2, ...,  satisfy 

Yu  -A  Y°  and  X\  -a  X{-  for  all  i,  as  u  -A  oo. 

Moreover,  suppose  that  the  functional  (Y,  X)  t-A  IZ(Y\X)  is  continuous  at  (Y°,  A0),  IZ  is  continuous  at 
0,  and  X°  is  nondegenerate.  Then,  the  surrogate  estimates  Cq  +  (cu,X°),  u  =  1,2, ...,  ofY°  given  hy 

cu  €  S(Yx\X\)  and  %  =  (1  -  A)77(Y ?  -  (<? , 

with  A  G  (0, 1),  Yx  =  Yu /( 1  —  A),  and  Xvx  =  Xv /{l  —  A),  satisfy 

U (Y°)  <  liminf/rA77^(^  +  {cu ,  X0))) 

for  all  /I  G  (0, 1). 

6Recall  that  for  a  sequence  of  sets  the  outer  limit  limsup,,  A"  is  the  collection  of  all  points  that  are  limits  of 

subsequences  of  points  selected  from  {A1'} 
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Proof.  Since 


ag  +  (<?,xv)  =  %  +  +  (c",x"  -  x°)t 

convexity  of  72  and  Theorem  4.1,  applied  for  every  u.  imply  that 
K(y')<AK(i(^+(^,X-)))  <MAK(T(^+(^,X»»)+(l-M)AK(p-L^{^,X1'-X»)).  (13) 

Next,  we  establish  the  boundedness  of  {cI/}£T1.  An  application  of  Lemma  3.3,  with  (eg,  c1')  €  <S°(YI'|Ai'), 
the  associated  residual  statistic,  6"  =  1Z(YU\XU)  —  E[YV],  and  b  =  72(Y°|A°)  —  E[Y°]  so  that  £(YU  — 
[cq  +  (c",  A")])  =  TZ(YU \XU)  —  E[YV]  and  bv  -»  b,  implies  the  boundedness  of  {(cq,  ci')}£T1  and  there¬ 
fore  also  of  {cu}r^L1.  The  boundedness  of  {cu}^L1  and  the  fact  that  X\  — >•  X®  for  i  =  1,  ...,n,  result  in 
(cv  ,XV  —  A0)  — >  0.  Since  71  is  continuous  at  0,  we  have  that  7£((c1',  Xu  —  A0))  — >  72.(0)  =  0  and  due 
to  closedness,  liminf^  1Z(YV)  >  72(A0).  The  conclusion  therefore  follows  by  taking  limits  on  both  sides 
of  (13).  □ 

Again,  the  positively  homogeneous  case  results  in  simplified  expressions. 

4.5  Corollary  If  the  assumptions  of  Theorem  4.4  hold  and  the  surrogate  estimates  Cq  +  (c^A0), 
v  =  1,2, ...,  ofY 0  are  given  hy 


cv  G  S(YU \XU)  and  %  =  7Z(YU  -  (cu,Xu)). 


Then, 


A0)). 


Theorem  4.4  supports  surrogate  estimation  in  the  following  context.  Historical  data,  viewed  as 
observations  of  an  unknown  random  variable  Y°  and  a  random  vector  A'0,  can  be  utilized  in  generalized 
regression  using  an  error  measure  corresponding  to  a  risk  measure  of  interest.  This  yields  the  “slope” 
cv  and  an  “intercept”  Cq  subsequently  computed  as  specified  in  Theorem  4.4.  Suppose  then  that  the 
random  vector  A0  becomes  available,  for  example  due  to  additional  information  arriving.  This  is 
the  typical  case  in  factor  models  in  finance  where  Y°  is  a  stock’s  random  return  and  A0  might  be 
macroeconomic  factors  such  as  interest  rates  and  GDP  growth.  Forecasts  of  such  factors  are  then  used 
for  A0.  Alternatively,  A0  might  have  been  available  from  the  beginning,  which  is  the  case  when  it  is 
an  input  vector  to  a  discrete-event  simulation  selected  by  the  analyst.  Regardless  of  the  circumstances, 
the  surrogate  estimate  Cq  +  (c",  A0)  then  provides  an  approximation  of  Y°  that  is  “tuned”  to  the  risk 
measure  of  interest.  If  the  initial  data  is  large,  then,  in  view  of  Theorem  4.4,  we  expect  the  risk  of  the 
surrogate  estimate  to  be  an  approximate  upper  bound  on  the  risk  of  Y°. 

A  situation  for  which  the  mapping  (Y,  A)  i— >  7Z(Y\X)  is  continuous,  as  required  by  Theorem  4.4,  is 
stated  next. 


4.6  Proposition  The  functional  (Y,  A)  i— y  7Z(Y\X)  on  C^+i’  S^ven  J-°  terms  of  a  regular  measure  of 
risk  72,  is 
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(i)  convex, 

(ii)  closed  at  points  (Y,  X)  where  X  is  nondegenerate,  and 
(Hi)  continuous  if  1Z  is  finite. 

Proof.  Part  (i)  follows  by  a  similar  argument  to  the  one  leading  to  the  convexity  of  1Z(-\X)  for  fixed 
X;  see  Theorem  3.4.  For  Part  (ii),  we  consider  Yu  — >•  Y,  Xv  — >•  X,  (eg, c1')  £  argminCo  £ (Y v  — 
[co  +  (c,  Xv)]),  which  is  nonempty  due  to  Theorem  3.4  under  the  nondegenerate  assumption  on  X,  and 
£{YV  -  [eg  +  (c,Xu)])  <b  €  1R  for  all  v.  Hence,  1Z{YV \XV)  -  E[YV }  =  £(YV  -  [eg  +  (cu,Xu)])  <  b 
for  all  v.  An  application  of  Lemma  3.3  implies  that  there  exist  eg  £  JR,  c*  £  Mn,  and  a  subsequence 
{(eg,  (Xy\veN->  with  (cg,^)  — (cg,c*),  and  £{Y  —  [eg  +  (c*,  A)])  <  b.  Consequently,  1Z(Y\ X)  —  E[Y]  = 
minC0;C  £ (Y  —  [co  +  (c,  A)])  <  £(Y  —  [eg  +  {c* ,X)])  <  b,  which  establishes  the  closedness  of  7£(-|-)  —  E(] 
at  points  (Y,X)  with  X  nondegenerate.  The  expectation  functional  is  finite  and  continuous  on  C?  so 
the  closedness  of  7£(-|-)  is  also  established  at  such  points.  In  Part  (iii)  we  first  consider  for  c  €  Mn  the 
functional 

(Y,  X)  ^  MY,  X)  :=  (c,  E[X})  +  U(Y  -  (c,  X)), 

which  is  convex  and  closed  on  E-^+i  by  the  regularity  of  1Z.  Since  7Z  is  finite,  (pc  is  also  finite  and 
therefore  continuous.  Thus,  <pc  is  bounded  above  on  a  neighborhood  of  any  point  in  £^+1.  Since 
n-o  <  g>c(',  •)  for  all  c  €  M,  7Z(- 1-)  is  also  bounded  above  on  a  neighborhood  of  any  point  in  £^+1.  In 
view  of  [23,  Theorem  8],  the  convexity  and  finiteness  of  72.(-|-)  together  with  this  boundedness  property 
imply  that  1Z(- 1-)  is  continuous.  □ 


5  Tracking  of  Conditional  Values 

Applications  often  direct  the  interest  not  only  to  a  random  variable  Y,  but  also  to  random  variables 
representing  values  of  Y  given  certain  realizations  of  a  related  random  vector  X.  In  particular,  this  is 
the  case  when  the  random  vector  X  is,  at  least  eventually,  under  the  control  of  a  decision  maker. 

We  consider  the  situation  where  for  g  :  Wtn  x  Mm  — >•  ]R  and  random  vectors  X  €  C?n  and  V  €  E'(rl , 
the  random  variable  Y  £  C2  is  of  the  form 


Y  =  g(X,  V ), 

where  the  equality  holds  almost  surely7.  Then,  the  parameterized  random  variables 

Yx  =  g(x,  V),x  £  ]Rn , 

represent  “conditional”  values  of  Y.  The  goal  might  be  to  track  a  specific  statistic  of  Yx  as  x  varies  or 
to  select  x  £  Mn  such  that  Yx  is  in  some  sense  minimized  or  adequately  low,  for  example  as  quantified 

7Of  course,  conditions  on  g  are  needed  to  ensure  that  the  random  variable  is  in  C2 . 
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by  the  risk  of  Yx.  If  the  distribution  of  Yx  is  unknown  and  costly  to  approximate,  especially  in  view  of 
the  set  of  values  of  x  that  needs  to  be  considered,  it  might  be  desirable  to  develop  an  approximation 

Co  +  (c,  x)  k,  TZ{YX),  x  £  Mn. 

We  refer  to  such  approximations  of  the  risk  of  conditional  random  variables  as  risk  tracking. 

As  indicated  in  Section  1,  the  area  of  statistical  regression  indeed  examines  models  of  conditional 
random  variables,  but  typically  at  the  level  of  expectations,  such  as  in  classical  least-squares  regression, 
and  quantiles.  We  here  consider  more  general  statistics,  make  connections  with  measures  of  risk,  and 
examine  risk  tracking.  We  start  with  tracking  of  statistics. 


5.1  Statistic  Tracking 

We  say  that  a  regression  function  co  +  (c,  •),  computed  by  minimizing  a  regular  measure  of  error,  i.e., 
(co,c)  £  <S°(y|X),  tracks  the  corresponding  statistic  if 

Co  +  (c,  x)  €  S(YX)  for  x  £  Mn . 

Of  course,  this  is  what  we  have  learned  to  expect  in  linear  least-squares  regression  where  the  measure 
of  error  is  8  =  ||  •  || 2  and  the  statistic  is  the  expectation  and  in  this  case  surely  a  singleton.  In  view 
of  the  Regression  Theorem  in  [26],  this  can  also  be  counted  on  in  situations  with  error  measures  of 
the  “expectation  type.”  However,  tracking  might  fail  if  the  conditional  statistic  is  not  captured  by  the 
family  of  regression  functions  under  consideration  and  even  other  times  too  as  shown  in  [21]. 

The  next  result  deals  with  a  standard  model  in  regression  analysis,  under  which  statistic  tracking 
is  achieved  for  regular  error  measures. 


5.1  Theorem  (statistic  tracking  in  regression)  For  given  Cq  £  1R,  c*  £  Mn ,  suppose  that 

Y (u)  =  Cq  +  (c*,X(u))  +  s(ui)  for  all 

with  e  €  C?  independent  of  Xi  €  C? ,  i  =  1, ...,  n.  Moreover,  let  8  he  a  regular  measure  of  error  and  7Z, 
S,  and  <S(-|X)  he  the  corresponding  risk  measure,  statistic,  and  partial  residual  statistic,  respectively. 
If  TZ  has  a  representable  risk  identifier  at  £  and  £  €  int(dom7?.),  then  c*  €  <S(T|A!)  and 

co  +  (c* ,  x)  £  S(YX )  for  all  x  £  Rn  and  do  €  S(Y  —  (c*,X)). 


Proof.  Let  99  :  Rn  — »•  [0, 00]  be  defined  by  99(c)  =  V((c,X)  +  s)  for  c  €  Mn.  In  view  of  [23,  Theorem 
19]  and  the  fact  that  IXyZ)  =  7Z(Z)  —  E[Z\,  we  obtain  the  subdifferential  formula 


d<p(c)  =  { E[(Q  -  1)X] 


Q  edU((c,x)  +  £)y 


Since  there  exists  a  Q  €  871(e)  that  is  independent  of  X  by  Proposition  3.6  and  E[Q\  =  1  for  every 
Q  £  Q,  we  have  that  0  £  99(0)  and  c  =  0  minimizes  99.  Moreover,  c  =  c*  minimizes  T>({c*  —  c,X)  +  s) 
and  also  V(Y  —  (c,X)).  Thus,  c*  £  <S(T|X)  by  Theorem  3.9.  Finally, 


c0  £S(y-(c*,X»  =5(e  +  cS)  =S(e)  +  {c$}. 
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Since 

S(YX)  =  S(cq  +  (c*,x)  +  s)  =  {cq  +  (c* ,  x)}  +  5(e), 

the  conclusion  follows.  □ 

Example  6:  Risk  tracking  of  superquantile/CVaR.  Superquantile  regression  [21]  involves  mini¬ 
mizing  the  regular  measure  of  error 

S(Y)  =  — —  [  max{0,  Gy  (/3)}d/3  -  E[Y } 

1  -  «  Jo 

for  a  £  [0, 1),  where  Gy{fd)  is  the  /3-superquantile  of  Y,  i.e. ,  the  CVaR  of  Y  at  level  /3.  The  statistic 
corresponding  to  this  measure  of  error  is  a  superquantile/CVaR;  see  [25,  21]  and  Appendix.  (We  note 
that  the  risk  measure  corresponding  to  this  error  measure  is  the  second-order  superquantile  risk  measure, 
which  is  finite  and  also  has  a  representable  risk  identifier;  see  the  Appendix.)  Consequently,  Theorem 

5.1  establishes  that  under  the  assumption  about  Y,  there  exists  (co,c)  €  5°(y|A),  the  associated 
residual  statistic  of  6,  such  that 

Co  +  (c,x)  =  Gyx(a)  =  superquantile-risk/C  VaR  of  Yx,  for  x  £  Mn . 

In  summary,  risk  tracking  of  superquantile-risk/CVaR  is  achieved  by  carrying  out  superquantile  regres¬ 
sion;  see  [5]  for  an  alternative  approach  to  tracking  CVaR. 

5.2  Risk  Tracking 

In  the  previous  subsection  we  established  conditions  under  which  generalized  regression  using  a  specific 
measure  of  error  tracks  the  corresponding  statistic.  Even  though  one  can  make  connections  between 
statistics  and  measures  of  risk,  as  indicated  in  the  preceding  example,  a  direct  approach  to  risk  tracking 
is  also  beneficial.  We  next  develop  such  an  approach  that  relies  on  fewer  assumptions  about  the  form  of 
Y  as  a  function  of  X.  The  relaxed  conditions  require  us  to  limit  the  study  to  conservative  risk  tracking. 

The  goal  is  to  select  x  such  that  TZ(YX)  is  minimized  or  sufficiently  small  for  a  given  choice  of  risk 
measure  1Z.  This  is  the  common  setting  of  risk-averse  stochastic  programming.  Here,  in  contrast  to 
the  previous  sections,  there  is  no  probability  distribution  associated  with  “x.”  Still,  when  g  is  costly 
to  evaluate,  it  might  be  desirable  to  develop  an  approximation  of  lZ(Yx),x  £  Mn  through  regression 
based  on  observations  {xJ,  y-?}J=1,  where  £  lRn  and  yi  =  g(xG  v^),  with  v J  being  a  realization  of  V, 
j  =  1, ...,  v.  One  cannot  expect  that  a  regression  function  Co  +  (c,  •)  obtained  from  these  observations 
using  an  error  measure  corresponding  to  a  specific  risk  measure  generally  tracks  lZ(Yx),x  £  Mn,  even 
if  sampling  errors  are  ignored.  In  fact,  one  can  only  hope  to  track  the  statistic  as  laid  out  in  the 
previous  subsection.  The  next  result,  however,  shows  that  one  can  achieve  conservative  risk  tracking 
under  general  assumptions. 

5.2  Theorem  (conservative  risk  tracking)  Suppose  that  X  £  C2n,  V  €  C?m,  and  g  :  ]Rn  x  Mm  M 
satisfy  g(X ,  V )  €  C2 ,  g{x ,  V )  €  C?  for  all  x  £  Mn,  and  there  exists  an  L  :  lRm  — >  1R,  with  L(V)  £  C? , 
such  that 

\g(x,v)  —  g(x',v)\  <  L(v)\\x  —  x'||  for  all  x,x'  £  lRn  and  v  £  Mm. 
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Let  <S(-|X)  be  a  partial  residual  statistic  associated  with  a  positively  homogeneous,  monotonic,  and 
regular  measure  of  risk  X.  If  c  €  S(g(X,  V)\X)  and  co  =  X(g(X,  V )  —  (c,  X)),  then  for  x  G  Mn, 

X(g(x,  V))  <  c0  +  (c,  x)  +  X((c,  X  -  x))  +  X(L(V) \\X  -  x\\)  <  B0  +  (c,  x )  +  pK{ \\X  -  x||), 

where 8  p  =  supL(F)  +  ||c||. 

Moreover,  the  upper  bound  on  7Z(g(x,V))  is  tight  in  the  sense  that  if  X  is  finite,  p  <  oo,  and 
Xu  G  C?n  is  such  that  Xu  — >  x,  then  for  cv  G  S(g(Xu ,  V)\XV)  and  Cq  =  7Z(g(Xl/,  V)  —  (cu ,  Xu)), 

K{g{x,V))  =  lim  +  (c", x)  +  pX{\\Xv  -  x||) 

v — ^oo  L  J 

when  {cv}™=l  is  bounded. 

Proof.  The  Lipschitz  property  for  g(-,v)  implies  that 

g{x,V)  <  g{X,V)  +  L(V)\\X  -  x\\  a.s. 

Since  X  is  monotonic  as  well  as  sublinear,  we  obtain  that 

X(g(x,  V ))  <  X(g(X,  V ))  +  X(L(V) \\X  -  x\\).  (14) 

Since 

c0  +  (c,X)  =  c0  +  (c,x)  +  {c,X  -  x), 

sublinearity  of  X  implies  that 

X(c0  +  ( c ,  X))  <  c0  +  (c,  x)  +  X((c,  X  —  x)). 

By  Corollary  4.2, 

X(g(X,V))  <X(Bo  +  {c,X)). 

Combining  this  result  with  (14)  yields  the  first  inequality  of  the  theorem.  The  second  inequality  is 
reached  after  realizing  that  the  monotonicity  and  positive  homogeneity  of  X  imply  that  X((c,  X  —  x))  < 
\\c\\X(\\X  -  x||)  and  X(L(V) \\X  -  x||)  <  sup  L{V)X(\\X  -  s||). 

We  next  consider  the  final  assertion.  Since  X  is  continuous  and  \\XV  —  x||  — >•  0,  pX(\\Xu  —  x||)  — > 
pX( 0)  =  0.  Moreover, 

^  +  (cu,  x )  <  X(g(X",  V))  +  X{{cv ,  x  -  X")). 

The  Lipschitz  property  ensures  that  g(Xv ,V)  — >•  g(x,V)  and  the  boundedness  of  {cv}^=l  results  in 
(cu,  x  —  Xv)  — >•  0.  In  view  of  the  continuity  of  X  the  conclusion  follows.  □ 

Theorem  5.2  shows  that  an  upper  bound  on  the  risk  of  a  parameterized  random  variable  can  be 
achieved  by  carrying  out  generalized  regression  with  respect  to  a  constructed  random  vector  X.  We 
recall  that  in  the  setting  of  a  parametrized  random  variable  Yx  =  g(x ,  V)  there  is  no  intrinsic  prob¬ 
ability  distribution  associated  with  “x.”  However,  an  analyst  can  select  a  random  vector  X,  carry 

sHcre  the  essential  supremum  is  denoted  by  “sup.” 
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out  generalized  regression  to  obtain  c,  and  compute  Co-  The  obtained  model  Co  +  (c,  •)  might  not  be 
conservative.  However,  an  additional  term  pTZ(\\X  —  -||)  shifts  the  model  sufficiently  higher  to  ensure 
conservativeness . 

The  additional  term  plZ(\\X  —  -||)  has  an  interesting  form  that  guides  the  construction  of  X.  If  the 
focus  is  on  x  €  Mn  near  x  €  Mn ,  say  within  a  “trust  region”  framework,  then  X  should  be  nearly  the 
constant  X  =  x  such  that  ||X  —  £’||  is  low  as  quantified  by  1Z.  We  then  expect  a  relatively  low  upper 
bound  on  7 Z(g(x,V))  for  x  near  x.  In  fact,  this  situation  is  addressed  in  the  last  part  of  the  theorem. 
However,  as  x  moves  away  from  x,  then  the  “penalty”  p7Z(\\X  —  x||)  increases. 

A  possible  approach  for  minimizing  7Z(g(-,  V)),  relying  on  Theorem  5.2,  would  be  to  use  in  gener¬ 
alized  regression  the  observations  {x3  ,y3}j=1,  where  x3  €  JBn\  y3  =  g(x3 ,v3),  and  realizations  v3  of  V, 
j  =  1, ...,  zz,  and  a  carefully  selected  distribution  on  {x3}j=1,  centered  near  a  current  best  solution  x,  to 
construct  c  and  Co  as  stipulated  in  Theorem  5.2.  The  upper-bounding  model  Co  +  (c,  •}  +  p7Z{\\X  —  -||) 
could  then  be  minimized  leading  to  a  new  “best  solution”  x.  The  process  could  be  repeated,  possibly 
with  an  updated  set  of  observations.  Within  such  a  framework,  the  term  plZ(\\X  —  -||)  can  be  viewed 
as  a  regularization  of  the  affine  model  obtained  through  regression. 

The  minimization  of  the  upper-bounding  model  amounts  to  a  specific  risk  minimization  problem. 
In  the  particular  case  of  the  superquantile/CVaR  risk  measure  at  level  a  €  [0, 1)  and  realizations 
{x3}j=1,  with  probabilities  {p3  }J=1,  the  minimization  of  that  model  is  equivalent  to  the  second-order 
cone  program: 

1  v 

min  (c/p,x)  +  z0  +  _  Y] p3zj 

3= i 

\\xj  -  x\\  -  Zq<  Zj,  j  =  1,  ...,  V 
o  <Zj,  j  = 
x  £  lRn,z  G  Mu+1. 

We  observe  that  the  constant  Co  does  not  influence  the  optimal  solutions  of  the  upper-bounding  model 
and  is  therefore  left  out. 

6  Duality  and  Robustness 

Conjugate  duality  theory  links  risk  measures  to  risk  envelopes  as  specified  in  (1).  As  we  see  in  this 
section,  parallel  connections  emerge  for  measures  of  residual  risk  that  also  lead  to  new  distributionally 
robust  optimization  models. 

6.1  Duality  of  Residual  Risk 

Dual  expressions  for  residual  risk  are  available  from  that  of  the  underlying  measure  of  risk. 
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6.1  Theorem  (dual  expression  of  residual  risk)  Suppose  that  X  £  C2n  and  7Z{-\X)  is  a  measure  of 
residual  risk  associated  with  a  finite  regular  measure  of  risk  7Z,  with  conjugate  7 Z* .  Then, 


7l(Y\X)  =  sup 
QeQ 


{  E[QY}-n*(Q) 


E[QX }  =  E[X]}  for  Y  £  C2 . 


Proof.  Let  Y  £  C2  and  X  £  C2n  be  fixed.  We  start  by  constructing  a  perturbation.  Let  X  :  7Rn  x  C2  — » 
1R  be  given  by 


X(c,  U )  =  (c,  E[X])  +  7Z(Y  -  (c,  X)  -  U)  for  c  €  7Rn ,  U  £  C2 , 


which  is  convex  and  also  finite  because  7Z  is  finite  by  assumption,  and  let  U  H >  p(U)  :=  infceR«  X(c,  U) 
be  the  associated  optimal  value  function.  Clearly,  7Z(Y\X)  =  <y?(0)  by  Theorem  3.4(i).  Since  X  is  finite 
(and  also  closed  and  convex),  the  functional  U  i->-  X(0,U)  is  continuous  and,  in  particular,  bounded 
above  on  a  neighborhood  of  0.  By  [23,  Theorem  18]  it  follows  that  p  is  also  bounded  above  on  a 
neighborhood  of  0. 

Next,  we  consider  the  Lagrangian  /C  :  7Rn  x  C?  — >•  [— oo,  oo)  given  by 

/C(c,  Q)  =  inf2 1 E{c,  U)  +  E[QU]},  for  c£lRn,Q£C2, 


and  the  perturbed  dual  function  Q  :  C2  x  7Rn  ->  [— oo,  oo)  given  by 


Q(Q ,  v )  =  inf  /C(c,  Q )  —  (c,  v)  for  Q  £  C2,v  £  7Rn. 

c£Rn 

Then,  the  associated  optimal  value  function  of  the  dual  problem  is  v  i->-  y(u)  :=  supgg£2  Q{Q,v).  By 
[23,  Theorem  17]  it  follows  that  y?(0)  =  y(0)  because  p  is  bounded  above  on  a  neighborhood  of  0.  The 
conclusion  then  follows  by  writing  out  an  expression  for  7(0).  Specifically, 


6(Q,  0) 


inf  /C(c,  Q) 

■cz.  nn 


inf 

cei?n 


inf 

cei?n 


|[inf2{j-(C,t/)  +  E[Qt/]}| 

|(  inf2  {(c,  E[X\)  +  7l(Y  -  (c,X)  -  U)  +  E[QU]}  j 


cmf„  |  (c,E[X])  -  jup  {E[Q(-U)}  -  7Z(Y  -  (c,X)  -  [/)}} 
mfn  |  (c,E[X})  +  E[Q(Y  -  (c,X))]  -  jup  [e[QU]  -  7Z(U)}  | 
cmfn  {(c,E[X})  +E[Q(Y  -  (c,X))]  -  7Z*(Q)} 
mfn  { E[QY]  -  71* {Q)  +  (c,  E[X\  -  E[QX])} 

E[Q(Y]  —7Z*(Q)  if  E[X]  =  E[QX],  and  G{Q,  0)  =  —00  otherwise, 


which  results  in  the  given  formula.  □ 

The  restriction  of  Q  by  the  condition  E[QX]  =  EQ]  is  naturally  interpreted  as  another  “risk 
envelope.” 
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6.2  Definition  (residual  risk  envelope)  For  given  X  G  £2  and  risk  envelope  Q,  the  associated  residual 
risk  envelope  is  defined  as  Q(X)  =  {Q  G  Q  \  E[QX]  =  E[X]}. 

Clearly,  the  subset  {Q  G  Q  \  E[QX]  =  E[X]}  of  a  risk  envelope  Q  is  nonempty  due  to  the  fact  that 
1  €  Q;  see  for  example  [26].  Consequently,  Q(X)  is  a  nonempty  convex  set,  which  is  also  closed  if  Q  is 
closed.  The  discussion  of  this  “reduced”  set  in  the  context  of  stochastic  ambiguity  is  the  next  topic. 

6.2  Distributionally  Robust  Models 

We  again  return  to  the  situation  examined  in  Section  5.2  where  the  focus  is  on  the  parameterized 
random  variable  Yx  =  g{ x,V)  defined  in  terms  of  a  function  g  :  ]Rn  X  J?m  ->  M,  with  V  G  C?m.  We 
now,  however,  show  that  measures  of  residual  risk  give  rise  to  a  new  class  of  distributionally  robust 
optimization  models  capturing  decisions  under  ambiguity. 

A  risk-neutral  decision  maker  might  aim  to  select  an  x  G  Mn  such  that  the  expected  value  of 
Yx  is  minimized,  possibly  also  considering  various  constraints.  If  risk  averse,  she  might  instead  want 
to  minimize  the  risk  of  Yx  as  quantified  by  a  regular  measure  of  risk.  It  is  well  known  that  the 
second  problem  might  also  arise  for  a  risk-neutral  decision  maker  under  distributional  uncertainty.  In 
fact,  for  every  positively  homogeneous,  monotonic,  and  regular  measure  of  risk,  the  dual  expression 
TZ(Y)  =  sup q^qE[QY]  can  be  interpreted  as  computing  the  worst-case  expectation  of  Y  over  a  set  of 
probability  measures  induced  by  Q;  see  for  example  [3,  16,  19,  2,  18]  for  extensive  discussions  of  such 
optimization  under  stochastic  ambiguity. 

It  is  clear  from  Theorem  3.4  that  the  parameterized  random  variable  Yx,  assumed  to  be  in  £2  for 
all  x  G  lRn,  satisfies 

E[YX]  <  K(YX \V)  <  K{YX)  for  every  x  G  Mn. 

Here,  we  have  shifted  from  X  to  V  as  the  random  vector  that  might  help  explain  the  primary  random 
variable  of  interest  Yx.  In  this  setting,  x  is  simply  a  parametrization  of  that  variable.  We  show  next 
that  the  problem  of  minimizing  the  residual  risk,  i.e.,  solving 

min  1Z(YX;  V),  (15) 

x&Rn 

leads  to  a  position  between  the  distributional  certainty  in  the  expectation-minimization  model  and  the 
distributional  robustness  of  a  risk  minimization  model. 

In  view  of  Theorem  6.1,  we  see  that  when  Yx  G  £2,  x  G  lRn,  V  G  E2rn,  and  7Z(-;  V)  is  a  measure  of 
residual  risk  associated  with  a  positively  homogeneous,  finite,  and  regular  measure  of  risk,  the  problem 
(15)  is  equivalent  to 

min  \  sup  {E[QYX\  \  E[QV }  =  E[V}}  1  .  (16) 

Here,  the  supremum  is  taken  over  a  smaller  set  than  in  the  case  of  the  distributionally  robust  model  of 
minimizing  the  risk  of  Yx.  In  fact,  the  supremum  is  taken  over  the  residual  risk  envelope  Q(V).  The 
reduction  is  achieved  in  a  particular  manner,  which  is  most  easily  understood  when  the  risk  measure  is 
monotonic:  We  recall  that  then  1Z(YX)  =  sup^gQ  E[QYX]  is  the  expected  cost  of  Yx  for  a  decision  maker 
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that  only  nominally  believes  the  probability  measure  P  and  considers  a  “worst-case”  probability  measure 
as  characterized  by  the  risk  envelope  Q.  In  contrast,  1Z(YX\V)  =  sup q^q{E[QYx]  \  E[Q V]  =  E[H]} 
is  the  worst-case  expected  cost  for  the  decision  maker  if  she  is  willing  to  believe  that  the  nominal 
probability  measure  P  at  least  assigns  the  correct  mean  to  V,  i.e.,  E[V]  =  Ep>[V}.  with  P'  being  the 
“true”  probability  measure  on  fh  Of  course,  V  can  be  artificially  augmented  to  include  terms  like  V-2 
and  even  random  variables  that  do  not  enter  g  and  therefore  do  not  influence  Yx  directly.  Consequently, 

minimizing  residual  risk  is  equivalent  to  minimizing  a  distributionally  robust  model  under 
moment  matching. 

In  view  of  Theorem  3.4,  the  solution  of  (15)  benefits  from  the  representation  of  residual  risk  in  terms 
of  the  associated  measure  of  regret  V  and  therefore  amounts  to  solving 

{C°  +  <C'  E||/1>  +  V(g(X'  V)  -  [C0  +  <*■  <17> 

which  is  convex  if  g  is  linear  in  its  first  argument  or  if  g  is  convex  in  its  first  argument  and  V  is 
monotonic.  Hence,  residual  risk  gives  rise  to  a  tractable  class  of  distributionally  robust  optimization 
models  that  captures  ambiguity  about  the  underlying  probability  measure. 


Appendix 

Three  risk  quadrangles  are  especially  relevant  due  to  their  connections  with  known  regression  tech¬ 
niques;  see  [26,  24,  25]  for  details: 

Example  7:  Mean  risk  quadrangle.  For  A  >  0,  the  choice  7 Z(Y)  =  E[Y]  +  Xa(Y),  where 
°{Y)  :=  y/E[(Y  -  E[Y})2\,  is  a  positively  homogeneous  and  regular  measure  of  risk.  The  corresponding 
risk  envelope  Q  =  {Q  =  1  +  XZ  \  y/ E[Z2\  <  1 ,  E[Z\  =  0},  the  regret  V(Y)  =  E[Y]  +  X^E[Y2},  the 
deviation  V{Y)  =  Xa(Y),  the  error  £(Y)  =  X^E\Y2],  and  the  statistic  S(Y)  =  {E[y]},  which  of  course 
corresponds  to  least-squares  regression. 

Example  8:  Quantile  risk  quadrangle.  We  recall  that  the  cc-quantile,  a  €  (0, 1),  of  a  random 
variable  Y  is  Gy(a)  :=  min{y|FV(y)  >  a},  where  Fy  is  the  cumulative  distribution  function  of  Y .  The 
a-superquantile  is  Gy(a )  :=  (1/(1  —  a)  JQ*  Gy(/3)df3.  The  measure  of  risk  1Z(X)  =  Gy  (a)  is  positively 
homogeneous,  monotonic,  and  regular,  and  gives  the  superquantile-risk/CVaR  for  a  €  (0, 1).  The  risk 
envelope  Q  =  {Q^G2\t)<Q<  1/(1  —  a),E[Q\  =  1},  the  regret  V(Y)  =  E/max-fO,  Y}}/(1  —  a), 
the  deviation  V{Y)  =  Gy(a)  —  E[Y],  the  error  £{Y)  =  E[max{0,  Y}}/{1  —  a)  —  E[Y],  and  the  statis¬ 
tic  S(Y)  =  [Gy  {a),  Gy  {a)},  where  Gy  (a)  is  the  right-continuous  companion  of  Gy  (a)  defined  by 
Gy(a)  :=  inflylEy (y)  >  a}.  Quantile  regression  relies  on  this  error  measure. 

Example  9:  Superquantile  risk  quadrangle.  The  second-order  a-superquantile  Gy  {a)  :=  1/(1  — 
a)  fa  dy(/3)d/3,  for  a  €  [0,1)  and  the  choice  TZ(Y)  =  Gy(a)  is  a  positively  homogeneous,  monotonic, 
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and  regular  measure  of  risk.  The  risk  envelope  is 


Q  =  c\{Q  &  C 


Q  = 


1  —  a 


q(/3)dj3 ,  q  an  integrable  selection  from  Qp,f3  €  [a,  1) 


where  cl  denotes  closure  and  Qp  is  the  risk  envelope  of  the  quantile  risk  quadrangle  at  level  (3.  The 
regret  V(T)  =  1/(1  —  a)  max{0,  Gy (f3)}d(3,  the  deviation  V(Y)  =  1/(1  —  a)  //'  Gy((3)df3  —  E[Y],  the 
error  £(Y)  =  f(]  max{0,  Gy(f3)}d/3  —  E[Y ],  and  the  statistic  S(Y)  =  (Gy(a)}.  This  error  provides 

the  foundation  for  superquantile  regression  [21]. 


The  risk  quadrangles  of  these  examples,  with  the  corresponding  statistic,  are  summaries  in  Table  1; 
see  [26]  for  many  more  examples. 


name  of  risk  quadrangle 

functional 

mean  (A  >  0) 

quantile  (a  €  (0, 1)) 

superquantile  ( a  €  (0, 1)) 

statistic  S 

E[Y] 

[Gy(a),G+(a)] 

Gy  (a) 

risk  JZ 

E[Y }  +  Acr(y) 

Gy  (a) 

Gy  (a) 

regret  V 

E[Y]  +  X^W^} 

r=^-E[max{0,X}] 

Jo  max{°>  Gy  {/3)}d/3 

deviation  T> 

Xa(Y) 

Gy  (a)  -  E[Y] 

Gy  (a)  -  E[Y] 

error  £ 

j^Efmax/0,  X}]  —  E[Y] 

Gy(P)}dp-E[Y] 

Table  1:  Examples  of  risk  quadrangles 


We  next  give  examples  of  representable  risk  identifiers  and  use  the  notation  Fy  for  the  cumulative 
distribution  function  of  Y  and 

Fy(y)-.=  lim  Fy(y'),  y  €  JR 
y'  Fy 

for  its  left-continuous  “companion.” 


Example  10:  Representability  of  superquantile/CVaR  risk  identifiers.  We  recall  that  a  risk 
identifier  Q 5  corresponding  to  the  superquantile/CVaR  risk  measure  JZ(Y )  =  (1/(1  —  a ))  f 1  Gy(j3)dP, 
where  a  €  (0, 1)  and  Gy  {(3)  is  the  /3-quantile  of  Y,  takes  the  form  [25]: 


for  a.e.  u ;  €  f2,  Q }  ( u )  =  l 


i 

1— a 


a 


if  Y ( uj )  >  Gy  (a) 

if  Y(u)  =  Gy  (a)  and  Fy(Y(u))  -  Fy{Y{  u))  >  0 
otherwise, 


(18) 


where 

y _ Fy(Gy(a ))  —  a _ 

(1  -  a)(Fv(Gy(a))  -  F~(Gy(a))) ' 


(19) 
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In  this  case,  we  set 

if  ?/  >  Gy ip) 

h(v)  =  <  r ^  if  y  =  Gy  (a)  and  FY(y )  -  Fy  (y)  >  0 

0  otherwise, 

which  is  Borel-measurable.  Moreover,  h(Y(u))  =  Q 5  (w).  Consequently,  for  any  Y  £  C2 ,  there  exists  a 
representable  risk  identifier  Q1  for  superquantile/CVaR  risk  measures. 


Example  11:  Representability  of  second-order  superquantile  risk  identifiers.  We  find  that 
a  risk  identifier  Q 5  corresponding  to  the  second-order  superquantile  risk  measure  TZ(Y)  =  (1/(1  — 
a))  j'l  GY((3)df3,  where  a  £  [0, 1)  and  GY(ji)  is  the  /3-superquantile  of  Y,  i.e.,  the  CVaR  of  Y  at  level 
/3,  takes  the  form  [25]:  for  a.e.  us  £  11, 


Ql  M  = 


Wh  lo§  if  «  <  /M  =  F(“)  <  1 

T— a  _lQg  YJFT)  +  1  +  lQg  W7M  ]  if  a<  f(us)  <F(us) 

VV  F7R  +  lQg  wSr1]  ^  /(w)  ^  ^  and  /M  < 

0  otherwise, 


^0  otherwise, 

where  F(us)  =  Fy(Y(uj))  and  f(us)  =  Fy(Y(us)).  In  this  case,  we  set 


%)  = 


log  if  «  <  f(y)  =  F(y)  <  1 

[lQg  +  1  +  lQg  ^frS]  if  a<f(y)<F(y) 


1  F(y)-a  .  1  -F(y)  ,  1  ~F(y) 

1-a  [ F(y)-f(y )  F(y)-f(y)  10&  1  -a 

0 


if  f(y)  <  a  <  F(y)  and  f(y)  <  F(y) 
otherwise, 

1  A/T  —  nY  i 


where  now  F{y)  =  FY(y)  and  f(y )  =  (y),  which  is  Borel-measurable.  Moreover,  h(Y(us))  =  Q*  (us). 

Consequently,  for  any  Y  £  C? ,  there  exists  a  representable  risk  identifier  Q'1  for  second-order  su¬ 
perquantile  risk  measures. 
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